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Abstract. Linear scaling methods, or 0{N) methods, have computational and 

memory requirements which scale linearly with the number of atoms in the system, N, 
in contrast to standard approaches which scale with the cube of the number of atoms. 
These methods, which rely on the short-ranged nature of electronic structure, will allow 
accurate, ab initio simulations of systems of unprecedented size. The theory behind the 
locality of electronic structure is described and related to physical properties of systems 
to be modelled, along with a survey of recent developments in real-space methods 
which are important for efficient use of high performance computers. The linear 
scaling methods proposed to date can be divided into seven different areas, and the 
applicability, efficiency and advantages of the methods proposed in these areas is then 
discussed. The applications of linear scaling methods, as well as the implementations 
available as computer programs, are considered. Finally, the prospects for and the 
challenges facing linear scaling methods are discussed. 
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1. Introduction 

Electronic structure calculation methods based on the density functional theory (DFT) 
have been playing important roles in condensed matter physics for more than forty 
years. In the early stages, DFT calculations were employed mainly for the study of the 
electronic structure of simple solids, using a few atoms in a unit cell, with the use of 
periodic boundary conditions. Since then, there has been a huge effort to improve the 
accuracy and efficiency of the calculation techniques. In terms of efficiency, after the 
pioneering work by Car and Parrinello |T| , the size of the target systems has increased 
dramatically and more and more examples of the DFT studies, especially on aperiodic 
systems like surface structures, have emerged. DFT calculations on systems containing 
hundreds of atoms are currently ubiquitous. As the system size for DFT calculations 
has become larger, the variety of materials and phenomena investigated by the method 
has increased. The information of the total energy and atomic forces calculated by DFT 
methods can provide reliable data independently from experiments, and the methods are 
nowadays considered as one of the established research tools in many fields, like physics, 
chemistry, materials science, and many others. Recently, there have been DFT studies 
in the complex fields of nano-structured materials and biological systems. In the study 
of these classes of materials, we need to treat systems containing at least thousands of 
atoms. However, as is well known, once the number of atoms in a system reaches 
around one thousand, the cost of standard DFT calculations increases very rapidly as 
a cube of A^. To overcome this problem, the methods known as linear-scaling or 0{N) 
DFT methods have been developed |2]. The progress of these methods in the last ten 
to fifteen years is remarkable and the purpose of this review paper is to overview the 
recent progress of 0{N) DFT methods. 

We will start with an overview of the conventional DFT method and its advantages. 
In the normal DFT approach, we solve for the Kohn-Sham (KS) orbitals \E'j,fc(r), which 
are the eigenstates of the KS equation [s]. 



2m 



Here is the Kohn-Sham Hamiltonian, and v and k are the band index and k points 
in the first Brillouin zone, respectively. Hereafter, we omit k for clarity because we 
consider large systems and the number of k points is small. Kxt(r) is the potential from 
nuclei, V//(r) is the Hartree potential, and Vxciy) is the exchange-correlation potential in 
the Kohn-Sham formalism. The most accurate DFT calculations often use a plane-wave 
basis set to express the KS orbitals: 

^,(r)= c,(G)exp(zG-r) (2) 

|G|<G max 

A plane- wave basis set has two main advantages. First, the accuracy of the basis set can 
be systematically improved. In Eq. (|2|), Gmax is obtained from the cutoff energy Ecut 
as — = -Ecut- The number of plane-waves, Ng, is controlled only by the number 
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Ecut- The accuracy of the basis set can be improved simply by increasing -Ecut, and a 
variational principle with respect to E^ut is satisfied. The other advantage is that forces 
can be calculated easily without the Pulay correction term because the basis set is 
independent on atomic positions (though such basis-set-dependent corrections become 
necessary when changing the unit cell size or shape). These two advantages make it 
possible to calculate both energy and forces accurately with plane-wave basis sets. 

In order to realise accurate plane-wave calculations, we need to introduce several 
theoretical techniques. First of all, plane-wave calculations rely on the idea of 
pseudopotentials ji]. With this method, it is possible to work only with valence electrons 
and their pseudo-wavefunctions, which are much smoother than the real wave functions 
which oscillate strongly in the core region, and to replace the nuclear potential and 
the core electrons with a pseudopotential. There have been several kinds of techniques 
proposed to make pseudo-wavefunctions smoother (5]-[7j. Using the method of ultra-soft 
pseudopotentials [s], even the cutoff energy for the localised 3d orbitals of transition 
metals can be reduced dramatically. With these improvements in theoretical techniques, 
the total energy converges quickly with respect to the cutoff energy and this is essential 
to make the accurate DFT calculations feasible. In addition, the major part of the 
error in the total energy usually comes from the expression of KS orbitals in the core 
region. Hence, the relative energetic stability of two states (e.g. two different atomic 
structures) can be reproduced without the absolute convergence because most of the 
errors are cancelled in the energy difference. Note that it is also possible to reduce the 
number of plane-waves by using augmentation for the wavefunctions in the core region 
as in the linearised augmented plane-wave (LAPW) or the projector augmented wave 



It is essential that we reduce the number of plane-waves by introducing the 
pseudopotential or other similar techniques. However, even with very smooth pseudo 
wavefunctions, Nq is typically one hundred times larger than the number of electrons. 
When we want to diagonalise < G\H'^^\G' >, the required memory scales as 0{Nq) and 
CPU time as 0{Nq). Hence it is impossible to employ direct (exact) diagonalisation 
except for very small systems. Instead of using exact diagonalisation, we can obtain the 
Kohn-Sham orbitals by minimising the DFT total energy with respect to the coefficients 
{cu{G)}, as shown in the work by Car and Parrinello jlj. Since we only need the 
occupied Kohn-Sham orbitals in such iterative methods, the memory requirement to 
store {c,y{G)} is proportional to NbNg, which is roughly 100 times smaller than Nq. 
Then, we update the coefficients {cu{G)} by calculating the gradient of the total energy 
with a constraint to keep the KS orbitals orthogonal to each other. This is done by 
calculating {H^^ — K^y) or — e^ with Gram-Schmidt orthogonalisation of {cy{G)}. 
In the calculation of the Kohn-Sham Hamiltonian, we need to calculate the density n(r). 
For this, we first calculate n(r) as 



If we perform this in a straightforward way, we need the operations of 0{Nq) for each 



(PAW) method ^9]. 




(3) 
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band z/, and the total number of operations needed for the transformation from {cu{G)} 
to ^I/y(r) is of the order of NbNq, which is quite expensive. However, we can dramatically 
reduce the cost of the calculation using the fast Fourier transform (FFT) method, and 
the number of operations in Eq. ^ becomes A^siVc In(iVG). Although Eq. ^ is still 
the most expensive part in the calculations of small systems for many plane-wave DFT 
codes, the reduction of the computational cost by the FFT method is essential for the 
success of plane-wave DFT calculations. 

The orthogonalisation of the KS orbitals is also an expensive operation, which 
includes the calculations / dr\l/j,(r)\I'i,/(r) for all pairs of band indices {u, u'}. The total 
cost of the operations is 0{N'^Ng), but we can see that it is only proportional to A^^^. 
As we have seen, in the iterative method with the plane- wave basis set, there are no 
operations where the cost increases as fast as A"^. This is the reason why we can do 
efficient calculations even with large Nq- The iterative diagonalisation technique, FFTs, 
and ab initio pseudopotentials used in the plane-wave calculations are the key factors 
which make it possible to employ accurate but efficient DFT calculations. Using these 
techniques, with the increase of the computer power, the time for solving KS equations 
has become smaller and smaller, and the system size for the target of DFT studies has 
become larger and larger. There was a report already in 2002 of DFT calculations on 
a DNA system including hydrating water molecules, which consisted of 1194 atoms. 



including 138 water molecules 10 



However, this situation changed about 5-10 years ago. Recently, the growth of 
computer power mainly comes from the increase of the number of processors or cores, 
while the speed of each core or processor remains unchanged. The number of cores 
of the biggest supercomputers is currently reaching sub-millions. The Jaguar machine 
at Oak Ridge in the US has 224,162 cores, and the new Japanese supercomputer 'K' 
already has more than 700,000 cores. To utilize such computing power, it is essential to 
determine whether or not a technique or approach has good parallel efficiency. In this 
respect, the FFT method has a serious drawback. As is well known, the FFT needs all- 
to-all communication (i.e. each core communicating with all other cores) and the time 
required for communications will grow rapidly with the increase of cores or processors. 
As explained above, we cannot perform efficient plane-wave DFT calculations without 
the FFT technique. Thus, we need to introduce a different type of basis set which will 
be more suitable for parallel calculations. 

In addition, there is another serious obstacle to increasing the system size in DFT 
calculations. When the number of atoms exceeds a few hundred, the orthogonalisation of 
the Kohn-Sham eigenstates becomes the most expensive operation instead of the FFT. 
The CPU time for the FFT part is proportional to A"bA"g log(A"G) and it increases as 
0{N'^), since both Nb and A^^ are proportional to the number of atoms A^. On the 
other hand, the CPU time for orthogonalisation increases as N^Ng, which is 0{N^). 
Once this part becomes the most expensive part, it is very difficult to make the system 
size larger. 

From our brief survey of the field, we can see two key points which must be overcome 
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to realize efficient DFT calculations on extremely large systems and massively parallel 
computers. 

• Develop a method to calculate electronic structure which is suitable for massively 



parallel calculations 11 , 12 



Solve the electronic structure with better scaling than 0{N^), ideally with linear- 
scaling. 

We note that there have been considerable efforts in this area within standard DFT 



codes, often focussing on molecular dynamics 13 , 14 . However, there are limits to these 
efforts, and so we consider alternatives. For the ffist point, real-space methods are 
considered to have an advantage. There has been a concerted effort to develop practical 
real-space methods and great progress has been achieved in the last decade. Several 
DFT codes using this technique are now available to researchers. Here, it is essential to 
understand whether these new methods keep the advantage of plane-wave methods or 
not, such as high accuracy, ease of calculating atomic forces, and systematic convergence. 
Regarding the second point, there were already several proposals for 0{N) methods 
more than ten years agc[H where the cost of the calculation is only proportional to A^. 
Within empirical tight-binding (TB) methods, there have been a significant number of 



applications using such linear-scaling techniques 15 , 16 . However, there were almost 
no examples, until very recently, where linear-scaling DFT methods were used for the 
purpose of actual scientific research. To replace conventional DFT methods with a 
new method, it is necessary for the method to have the same accuracy and stability as 
standard methods, and reasonable efficiency. Compared with empirical TB methods, 
DFT calculations are more complex and have many potential sources of instability, 
especially in large-scale calculations. In this respect, the success of the ffist point is 
important also for the second point. Plane- wave DFT methods have been under intense 
development for over twenty five years and are widely used; competing for efficiency is 
therefore difficult for linear scaling methods, except for very large (thousands of atoms) 
systems. Identifying problems which require systems of this size can be a challenge, 
particularly to researchers used to the constraints of cubic scaling codes. 

The main purpose of this paper is to review the recent progress of the 0{N) 
methods. However, following our discussion above, we first survey recent progress in 
real-space methods. We then turn to the localisation of electronic structure, ffist for 
Wannier functions and then the density matrix. In the major part of the review, we 
survey linear scaling methods and related developments in seven different areas and 
consider extensions to standard DFT. Technical details (including non-orthogonality, 
electron number, parallelisation and sparse matrices) are dealt with in a separate section 
which is mainly intended for practitioners in the field; however, it is important to note 
that high parallel efficiency is a key criterion for a successful linear scaling code. Finally, 



I The recursion method, described in Sec. 3.2.3 dates back to the 1970s, while the first hnear scahng 



approaches were proposed in the early 1990s, for instance divide-and-conquer (Sec. 3.2.2) and DMM 



and OMM methods (Sec. 3.2.1) 
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we describe various implementations of the methods, as well as applications of linear 
scaling DFT before concluding with a survey of the challenges facing the field. 



2. Real-space methods 

As touched on above, the use of real-space methods both for efficient parallelisation 
and for modelling larger systems is well established, and has been reviewed elsewhere in 



detail 17 191. It is also used extensively for combined quantum mechanical/molecular 



mechanical (QM/MM) simulations, which is used both in solid state systems [20] and 



more commonly in biological systems 21 . There are many approaches to a real-space 
implementation of density functional theory, which can, like Gaul, be roughly divided 
into three part^ finite-difference methods; finite element methods; and the use of local 
basis functions. In these methods, the kinetic, pseudopotential and exchange-correlation 
energies are found in real space exclusively. The solution of the Laplace- Poisson equation 
for the electrostatic potential, however, sometimes retains the use of reciprocal space. 

In all cases, the advantage of real-space methods stems from spatial locality, which 
in turn leads to sparsity of the Hamiltonian; these ideas will re-appear throughout 
Sec. [3] Finite difference approaches represent the electronic states directly on a fixed 
grid in real space, with a finite difference operator for the kinetic energy. Finite element 
(FE) methods |22j use the piecewise continuous local basis of FE analysis. Local basis 
functions represent the wavefunctions in terms of local orbitals, often centred on the 
atoms (e.g. Gaussians). This spatial locality can lead to linear scaling Hamiltonian 
building, and is also at the heart of the linear scaling electronic solvers described in 
Sec. [3j We describe these approaches in outline below, but without the intention of 
giving a detailed review of the methods; other reviews are cited for the interested reader. 



2.1. Finite Differences 

Finite difference methods do away with a basis set entirely, and represent the 
wavefunctions directly by their numerical values on a grid; the grid spacing is one 
parameter by which the convergence can be judged. This approach requires an 
approximation for the differential operator used in calculating the kinetic energy and 
in solving the Poisson equation; the simplest approximation is found by expanding a 
function in positive and negative directions: 

IpiXn+l) = IpiXn) + i''iXn)h + ^^"(x„)/l^ + ... (4) 

/ o 

^{Xn-l) = ^{Xn) - ^'{Xn)h + ^^"{Xn)h^ - ^^"'{Xn)h^ + ■.■ (5) 

where h is the grid spacing and ip{xn) is the value of the function iIj{x) at a grid point 
Xn- By adding these two equations, an approximation for d^ip{xn)/dx^ can be derived 

§ "All Gaul is divided into three parts" , Julius Caesar, De Bella Gallico, Chapter 1 
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;ij{Xn)h^ + 0{h''){6) 



which is accurate to second order in h: 

d'^ll){Xn) 

dx^ - l^2i^y--^^' ' ^y—^' ^2 9x4- 

The errors can be of either sign, depending on the derivatives and value of which has 
the consequence that the FD method is not variational; the loss of variational nature can 
make it harder to converge the parameters used in computational methods. Naturally, 
there are higher order approximations for the Laplacian which can be generated (see, 
for example, the algorithm in Appendix A of Ref. [iT]); the order of the expansion is 
the other parameter which defines the convergence of these methods. An alternative 

has also been used and introduces 



discretisation, the Mehrstellen discretisation 23 



24 



a non-Hermitian Hamiltonian in exchange for greater accuracy of representation for 
a given order. In all higher order discretisation leads to a larger range for 

the operator, which impacts on efficiency. There have been proposals for variational 



representations of the kinetic energy operator in real-space 25 , 26 which have been 



compared in detail 26 ; these representations would alleviate the variational problem in 



finite difference methods. 

Once the Schrodinger equation has been discretised on the grid, it can be written 
in matrix form, with a size proportional to the number of grid points. However, the 
resulting matrices are sparse owing to the locality of the real-space representation. 
Indeed, the main source of spread in the matrices is the kinetic energy term (where 
the order of the approximation chosen will directly affect the locality). The sparsity of 
the matrices makes the method ideal for massive parallelisation [l7] and efficient solvers. 
Solution of the Poisson equation is often accomplished directly on the grid via multigrid 



techniques 35 , without recourse to FFTs which can cause problems to parallel scaling 

It has been applied by 



of plane-wave codes. 



The FD technique has been reviewed extensively 17 19 



a number of groups 12,24,27 32 with extension to PAWs 33 . The solution of the 



equations can be accelerated by the multigrid method 17 34 , which has been extensively 



applied in at least one real-space DFT code 35 . FD methods can also be combined 



with localised orbitals, either fixed to the atoms 35| 36 or with adaptive localisation 
regions 



37 ; these localised orbitals are an integral part of linear scaling approaches. 



2.2. Finite Elements And Local Real-Space Bases 

Finite element (FE) methods are well-known from the engineering field, and their 
application to electronic structure methods has a long history 22 ,38 40 . The method 



uses basis functions which are chosen to be piecewise polynomials, local in real-space. 
The simplest possible FE basis consists of linear functions which are one on the defining 
grid point and zero beyond its nearest neighbours; cubic functions are more common 
[38||40] (and indeed the blip functions mentioned below as a basis for the Conquest 
0{N) code [il] are functionally equivalent to finite element basis functions). 

In the FE method, the unit cell is divided into elements (the simplest of which are 
cubes, but the shape is in principle arbitrary so long as the simulation cell is filled). 
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Once the elements and basis functions are chosen, the Schrodinger equation can be 
written as a matrix equation, as was the case for the FD method (and the similarities 
and differences between the two techniques are elegantly described by Beck [l7j). In 
particular, the FE method introduces non-orthogonal basis functions, which leads to 
a generalised eigenvalue equation [39] (though this is a familiar problem in electronic 
structure techniques seen also for ultra-soft pseudopotentials). The mesh fineness can be 
different in different areas of the simulation cell, leading to efficiencies when large areas 
of vacuum are considered, or allowing all-electron calculations to be performed with 
appropriate resolutions in different areas of the cell. The method shares the advantages 
of locality of the FD method while also being variational, and has been applied in various 
areas 



22,42 46 



Similar to finite elements are another class of real-space basis functions which are 
local in real space, with many of the well-defined properties which make plane-waves 
valuable (e.g. orthonormality and systematic convergence). Daubechies wavelets [47] are 
a specific class of wavelets with attractive properties for electronic structure (particularly 
massively parallel or linear scaling implementations): they are orthogonal and local in 
real and Fourier space; the use of wavelets as a general approach to electronic structure 
has been discussed in detail elsewhere ^8] . It is also possible to make a multiresolution 
implementation |49]. Wavelets have been implemented as a basis in one major ab initio 
code (Abinit). 

Discrete variable representations 50 are another of this class of basis, and have been 



successfully applied to ab initio molecular dynamics calculations 51 . It is intriguing 



to note the close relationship between these basis functions and the psinc functions 52 



described below and used for a hnear scaling code (and earlier used to calculate kinetic 



energy in a real-space DFT code 53 . Both of these basis sets have the property that 
they are non-zero only on one grid point, and zero on all others (known as cardinality; 
wavelets are known as semi-cardinal as they are cardinal only at one resolution). 
However, despite their attractive properties, these basis sets are not yet in widespread 
use. As computational resources shift towards multi-core processors it may be that their 
properties make them more attractive than plane-waves. 

Lagrange functions form another cardinal basis set which have been proposed for 



electronic structure calculations 54,55 . A local, grid-based non-orthogonal basis also 
used in linear scaling methods are blip functions (or b-splines) [4l], which can be 
shown to be a form of finite element. They have also recently been used for linear 



scaling Quantum Monte Carlo calculations 56 . The psinc functions mentioned above 



which are periodic bandwidth-limited delta functions, are another local basis set 52 



interestingly, almost the same functions were derived in the context of optimal local 
basis sets 57 . One of the first ab initio 0{N) methods proposed |58j used a plane-wave 
basis to represent localised orbitals. 
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2.3. Atomic-like Orhitals 

Functions which mimic the atomic wavefunctions near the ionic core are a popular choice 
of basis function, which make sound computational sense: they provide an excellent 
solution for much of real space and are spatially local. However, to model correctly the 
changes which occur to electronic structure on formation of bonds, variational freedom 
is required, including both radial freedom for the valence electrons and angular freedom 
(often solved by adding orbitals with higher angular momentum than the valence states). 
Numerical atomic orbitals (NAOs — used for all-electron calculations) or pseudo-atomic 
orbitals (PAOs — used with pseudopotentials for convenience) are in wide use in both 



standard and linear scaling codes 59 67 , though this is by no means an exhaustive list 



(other bases include Gaussian-based orbitals 68 ,69 , muffin-tin orbitals, and augmented 
plane- waves) . Numerical atomic orbitals have been recently reviewed 70 . 

These basis functions are written as a radial function multiplied by a spherical 
harmonic (normally the real spherical harmonics): 

Xnlra{v) = Rnl{r)Yi{r) (7) 

The formalism allows for efficient evaluation of matrix elements. Analytic operations 
are possible for the angular terms, while the radial terms are performed in reciprocal 



space with a very fine mesh 59 , 71 , or analytically, for Gaussians. Typically, functions 
are confined within a sphere which removes extended tails and results in sparse matrices. 
Even for conventional codes, this can render the building of Hamiltonian matrices linear 



scaling (discussed further below, in Sec. 2.5), reducing a significant cost 



The major drawback with these basis sets is the lack of systematic convergence: the 
number of radial functions in a given angular momentum channel can be increased (often 
known as multiple zeta or multiple valence, so that two radial channels are notated DZ 
or DV) and extra angular momentum channels can be included but there is no clear rule 
as to how functions should be added to systematically improve the energy. There have 
been studies which show that convergence can be achieved, and which suggest routes 



to creation of convergent basis sets 59 , 60 , 62 but these schemes lack the simplicity of 



basis sets with a single parameter (e.g. the kinetic energy cutoff for plane-waves, or grid 
spacings for analytic real-space methods); an example of the convergence with respect 
to basis set size is shown in Fig. [l|^a). 

The problem of confinement has generated a number of different solutions. The 
simplest approach is effectively to impose an infinite potential well of some radius 



on the atom 71 , though this has the side effect that the derivative of the orbital 
is discontinuous at the boundary, which can cause problems with the calculation of 
forces and stresses. The confinement excites the atom slightly and mimics the effect of 
condensation into a molecule or condensed matter environment. However, it is not clear 
how to confine the atom, particularly as different orbitals will have different ranges. 

To avoid the discontinuity produced by an infinite potential, a number of 
suggestions have been made for an alternative potential (surveyed, along with the 
methodology known as ab initio tight binding, in a review [73)). Confining potentials 
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Figure 1. (a) Convergence of energy with PAO basis size for bulk silicon in siesta 72 
(b) Change in shape for atomic orbital following optimisation within OpenMX 62 
Reprinted figures with permission from J. Junquera et a/., Phys. Rev. B 64, 235111 
(2001) and T. Ozaki, Phys. Rev. B 67, 155108 (2003). Copyright (2001) and (2003) 
by the American Physical Society. 



suggested include: simple polynomials: (r^ 74 , 75 ); smoothing the free atomic 
wavefunctions with an exponential using a cutoff and width over which the smoothing is 
applied 65 ; an exponential potential applied between two points 72 ; a cubic truncation 
between two points (applied to the bare atom potential) [62]; and a product of an 
exponential and on the free atom (for all-electron calculations) [60]. All these 
different schemes produce a smooth transition to zero in the tails of the orbitals. 

As well as methods to confine the orbitals, there are many different methods to 
generate the basis sets themselves. These can be split into two approaches: first, how to 
generate a set of either pseudo atomic orbitals or numerical atomic orbitals (depending 
on whether or not a pseudopotential is used); and second, how to use these as basis 
functions (either as they are or combined into other functions). We will consider these 
two problems in turn. 

As there is considerable flexibility in deciding, for instance, at what radius to 
cut off a function, or how many radial functions to use for each angular momentum 
channel, much effort has gone into deciding how to generate accurate basis sets with 
minimal computational effort and human intervention. An early approach to the cutoff 
problem [76] was to use the energy change found on confining individual orbitals 
(typically in the range 50-300 meV) as a single, balanced criterion. The advantage 
is that one parameter can be used for orbitals of different inherent sizes, and this is 
used extensively in the siesta code (see Sec. 5.1 for more details). Another approach 
within SIESTA is to optimise the orbital shape relative to a highly converged plane- wave 
calculation by varying the confinement potential, but applying a fictitious pressure-like 
quantity 77 to stop expansion of the orbitals beyond a reasonable size. This idea of 
optimising the confinement has also been applied to a damping function multiplying the 
orbitals 



Spillage 79 80 is an idea which can be used when optimising atomic orbitals. In 
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a periodic system, it is defined as: 



S 



1 1 



5:5:(v^,(k)i[i-p(k)] i^.(k)) 



(8) 



where is the number of k points, Nb is the number of bands, |'?/'j(k)) is a Bloch state 
found with a plane- wave code and the projection operator P(k): 

P(k) = ^|0,(k))5^-J(0.(k)| (9) 



with S^y the overlap between atomic orbitals (see Sec. 4.1 for more details on non 



orthogonality). The spillage measures how well atomic-like orbitals can reproduce 
the wavefunctions from another calculation (e.g. highly converged plane-wave basis 
calculations). The spillage is used to guide optimisation of atomic-like orbitals and 
to investigate the generation of transferrable basis sets. For instance, basis sets using 
orbitals from both neutral and positively charged ions, and optimising cutoff radius to 



minimise spillage 65 were found to be transferrable and accurate. Spillage has also 



been used to optimise pseudo-atomic orbitals with radial functions are expanded in 



terms of Bessel functions 81 ; the functions are fit to converged plane-wave calculations 
of dimers. 

Generation of basis sets consisting of atomic-like orbitals can also be performed in 
a semi-systematic way. For instance, generation of successive solutions of a confined 
atom with increasing numbers of nodes for a given angular momentum channel |62], or 
building a large set of functions including different cutoffs, angular momenta, Rydberg 
functions and extend basis functions, and ordering the set by searching for the function 



which lowers energy most on addition to the existing set 60 . A method for the direct 



optimisation of the radial function |82j within a self-consistent loop has also been given 
to yield an optimal minimal basis (this parallels to some extent the representation of 
support functions or generalised Wannier functions by b-splines or psincs mentioned 



above in Sec. 2.2; it has also been suggested that localised functions can be expanded 



in terms of spherical waves 83 for free electrons, and recent analytic developments 84 



have simplified and improved the scaling of this method). Methods for optimising 



Gaussian basis sets are also available 69 , 85 



While a large set of atomic-like orbitals can be used directly to represent the 
wavefunctions, it can be more efficient to combine them into a smaller set of functions. 



Polarised atomic orbitals 86 are one way to do this: a minimal set of polarised atomic 



orbitals is defined in terms of a large basis set of standard quantum chemistry orbitals. 
As is to be expected, the contraction results in a small increase in total energy, but 
the convergence is good, and the error is linear in system size (indicating size extensive 
behaviour, and local errors); moreover, the structural relaxation is reliable. The idea 
has been refined to extract polarised AOs from molecular orbitals ^87j which is closely 
related to the extraction of Wannier functions described in Sec. 13.11 It has also been 
extended 88 so that minimisation of polarised AOs and the density matrix is separated; 



this was implemented within an 0{N) method and shown to be effective. A related 
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method 89 uses Cholesky decomposition to extract localised molecular orbitals from 
the density matrix. 

The OpenMX code mixes large numbers of PAOs (typically six per angular 



momentum channel) into a set of orbitals equivalent to a DZP basis 62 with the 
PAOs simply generated as orthonormal functions (by increasing the number of nodes) 
for a confined atom; the change in radial function following optimisation is illustrated 
in Fig. ^h). An optimised set of PAOs has been generated for calculations of biological 
molecules (covering H, C, N, O, P and S) p33|, and it was suggested that optimisation 
at each step of the minimisation is not necessary. A similar approach is used for the 
representation of support functions with pseudo-atomic orbitals in Conquest [66'|. It 
is important to note, though, that there are certain symmetry-imposed restrictions on 
the number of support functions that can be used: for instance, trying to represent four 
support functions only with / = 0, / = 1 and I = 2 will give incorrect answers in bulk 
crystals (breaking symmetry). A brief overview of approaches to PAO mixing, as well as 
a scheme for mixing two PAOs (neutral and 2^ atom) has been given [90] . A comparison 
of the method used in OpenMX and using spillage [9l] shows that comparison to plane- 
wave results with chemical accuracy can be achieved with localised orbitals, though two 
functions per angular momentum channel (DZDP) are generally needed (and up to 
1 = 2). A method combining gaussians from multiple sites into a minimal basis on each 



atom [92 93 has been proposed. It uses a filtration algorithm (c/ the FOE method 



below in Sec. 3.2.3) which removes unwanted high energy components to optimise the 
orbitals, and may allow large systems to be calculated on modest resources. 

Overall, it should be clear that there is considerable effort being made to understand 
and optimise atomic-like orbital approaches. Given the history of the method, it is 
perhaps a little surprising that there is still so much work to do, but that simply reflects 
the impossibility of finding a perfect basis set for the diverse environments modelled by 
density functional theory. 

2.4- Representing Localised Orbitals 

When performing 0{N) calculations, many codes represent the density matrix 



(described below in Sec. 3.2) in terms of localised orbitals, (^iQ,(r); for instance, onetep 
calls these non-orthogonal generalised Wannier functions and Conquest calls them 
support functions. These functions can be simply individual PAOs, or more generally 



represented in terms of a basis set, either the atomic-like orbitals of Sec. 2.3, or the local 



real-space basis sets described in Sec. 2.2 



Using atomic-like orbitals is convenient and gives a relatively small basis size. In the 
limit of a single PAO per localised orbital, of course, there is no optimisation required, 
which removes a level of complexity from the calculation; however, it is important to 
note that there can be problems for inverting the overlap matrix in the case of double 



zeta basis sets (discussed further in Sec. 4.1). Atomic-like orbitals also suffer from basis- 



set superposition error l94); while the magnitude of the error reduces with basis set size. 
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it is still significant 95 , and significantly worse for more contracted basis sets 67 , 
though can be corrected very successfully. These basis sets are widely used (e.g. in 
OpenMX, siesta and Conquest). 

The local, real-space basis sets, such as the psincs used in onetep and the b- 
splines used in Conquest, can be converged systematically to the plane-wave limit 



41,96 and are free from basis-set superposition error [97]. The resulting orbitals 
will be more transferable as they are optimised in situ, and possibly more accurate. 
However, these basis sets require more computational effort to converge than PAOs, 
and calculating small energy differences between structures will require tight tolerances 
on the minimisations. 

The real question when deciding on a basis set is that of accuracy versus 
computational cost. The minimisation is variational, which means that less effort will be 
required once the initial functions have been converged, and also that library functions 
could be calculated and read in as a starting point. In terms of matrix multiplications, 
which lie at the heart of linear scaling codes, real-space basis sets have a significant 
advantage: as a minimal number of orbitals can be used, the multiplies are significantly 
faster. For instance, for Group IV elements such as C and Si, the computational cost 
to go from four orbitals (minimal) to nine orbitals (the smallest number possible when 
using polarisation functions in a PAO basis) is a factor of 11, and going to thirteen 
orbitals (a double zeta plus polarisation) is a factor of 34. These factors can offset the 
extra time required but there is no single correct answer. 

2.5. Hamiltonian Building 

Many methods which are not linear scaling in the search for the ground state nevertheless 
use localised real-space basis sets, and rely on this locality to build the Hamiltonian in 



a linear scaling manner 61 98 100 . The computational effort required for Hamiltonian 
building is significant, and for a few hundred atoms with localised orbitals the resulting 
matrix can be exactly diagonalised efficiently, with most effort being spent in the 
Hamiltonian build. The Hamiltonian typically is made up of different terms, which 
can require different treatments: kinetic energy; electron-ion interaction (either via 
bare Coulomb term or pseudopotentials); Hartree energy; and exchange and correlation 
energies. The kinetic energy is inherently local (and only shows spread for high order 
finite difference methods) and will not be considered further. 

As individual basis functions are local in space, the integrals required to form the 
Hamiltonian can be reduced from 0{N^) scaling (the integrals between all pairs of 
basis function (A^^) must be evaluated over the whole system (A^)) to 0{N) scaling. 
For non-local pseudopotentials, the electron-ion interaction can be evaluated using 
a separable form to give integrals between localised orbitals and non-local projector 
functions, followed by a matrix multiplication. The local part is more complex: it is 
formally written as = J2k{4'i\Vk\4>j) for the matrix element between atoms i and j, 

which involves three-centre integrals and hence poor efficiency. The standard solution 
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involves integration on a grid (making the potential by summing over atoms k before 
the integration is performed), often using the charge and potential from the neutral 
(sometimes free) atom [59|[7l] to form a smoothly varying function (which is relatively 
insensitive to grid spacing). A method to make the neutral-atom potential separable has 
been proposed 101 which involves expanding the potential in terms of local functions; 
this allows the potential to be evaluated in the same way as the non-local potential. 
As the potential is spherically symmetric, it can be expanded with radial functions 
and spherical harmonics (up to / = 6 was found to be sufficient for convergence 101| ). 
This procedure removes any grid dependence apart from a charge density difference 
6n{r) = n{r) — I]i^j(i") with ?2j(r) an atomic density. 

The Hartree potential (found as the solution of the Poisson equation) is often solved 
using fast Fourier transforms, which are strictly 0{N log N); the standard approach to 
the electrostatic solution of the ionic problem, the Ewald sum, scales as C(A^^/^), though 
the use of a neutral-atom potential removes the need for this step. The other method 
in common use for electrostatic problems is the fast multipole method, which may scale 
as 0{N) asymptotically (see, for instance, discussions 98,99,102 104| ). Other methods 
used include density fitting 105 , FFTs combined with a wavelet solution for surface 



problems 106 and combining finite elements and Gaussians for the direct solution of 
the Poisson equation with the fast multipole method for calculation of the boundary 
conditions 



107 



The calculation of the exchange-correlation matrix is generally straight-forward, 
but a number of different approaches have been given, both exact and approximate 
71 75 l04| 108 110 . The question of calculating exact exchange within DFT (or 
Hartree-Fock) has been studied extensively, and a number of approaches which scale 
linearly with system size have been derived 111 119 . This ensures accuracy and 



efficiency are possible within local orbital, real-space codes. The route to linear-scaling 
construction of the Hamiltonian building is clear. Now we turn to consider the solution 
for the ground state of the system. 



3. Linear Scaling Methods 

As we have noted above, significant savings can be made even for conventional eigenvalue 
solvers if a basis set which is local in real space is used to represent the wavef unctions. 
The Hamiltonian building process becomes linear scaling, leaving the solution for the 
eigenstates as the most expensive part, and the part with the worst scaling. It is natural 
to consider whether this can be improved as well; herein lies the heart of the development 
of linear scaling codes. 

A natural first point to consider is that the Kohn-Sham formulation of DFT 
introduced the wavef unctions as an aid to solution, not as an integral part of the 
formalism. Indeed the Hohenberg-Kohn theorems rely only on the charge density of 
the system; we might ask whether a search over charge densities might not be a route 
to finding the electronic ground state. This leads to the approach known as orbital-free 
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DFT (OFDFT), discussed in detail in Sec. 3.2.5; in brief, it requires an approximation 



for the kinetic energy which can reduce accuracy, but is used as an efficient method for 
calculations on large metallic systems. 

Instead of the charge density, it is more helpful to work in terms of the density 
matrix, which is defined formally in terms of the eigenstates of the system as: 

p(r,r')=E/nV^n(rX(r') (10) 

n 

where n indexes the eigenstate and fn gives the occupancy of the state. As we are 
still within the Kohn-Sham approach, this is the single particle, two-point density 
matrix (not to be confused with the many-body density matrix familiar from statistical 
mechanics and quantum information). In operator notation, the finite temperature 



density matrix can also be written as the Fermi function of the Hamiltonian 120 
p = 1/ (^1 + exp[{H — jj,)/kBT]j. Many of the properties of the density matrix were 



investigated and summarised by McWeeny 121] . A consequence of quantum interference 



effects on the density matrix is that it is ranged: 

p(r, r') — 7- 0, |r — r'l — )• oo (11) 

However, the functional details of how the decay proceeds is rather complex, and can 
be related to the localisation of Wannier functions. We consider these ideas in the next 
section. 

3.1. Density Matrix Properties and Wannier Functions 

The Bloch states found when solving the Schrodinger equation for a periodic system (as 
is done in most electronic structure codes) are delocalised, and spread throughout the 
unit cell, and hence the entire system. There are many advantages to using localised 
functions, and the arbitrary phase associated with the Bloch states (which can be scaled 
by e"**^ with no change to the properties of the system) gives the freedom to do this. The 
most important route to understanding localisation is via Wannier functions. Wannier 



functions have been used extensively in electronic structure theory 122 -125 . They 
are formally defined for a periodic potential, with Bloch wavefunctions iV'nk) with n 
labelling a band. Then, for the unit cell at R, we can define the Wannier function as: 

V 



with 



kRn) = 2^/c^ke*'^-''|^nk) (12) 



Mr) = e^'-'un^ir) (13) 



and u„k(r) the periodic part of the Bloch function. The inverse relationship allows us 
to write the wavefunction in terms of the Wannier functions: 

R 
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It is important to note the considerable freedom in the choice of Wannier functions that 
the arbitrary phase of the Bloch functions gives. The locahsation of Wannier functions 



is closely related to the range of the density matrix (considered below) [126 ; it is also 
important in considering insulating behaviour against metallic behaviour in condensed 
matter systems 127 . The equivalent theory for non-periodic systems was developed 



for localised molecular orbitals 128 which also can be linked to pseudopotential theory 



129 



The study of localisation of Wannier functions is far from trivial. The earliest 



results 123 showed that the decay was exponential with distance for a one-dimensional 
centrosymmetric crystal, using complex wave vectors. This was extended to three 



dimensions 125 for periodic solids with no degeneracy (with the decay rate related to 



the distance of the branch surface from the real axis), as well as general one dimensional 



130 and three-dimensional |131,132 solids (though only in the tight binding limit 



for the 3D case). Non-periodic systems in one dimension have been shown to exhibit 
exponential localisation 133 , and in the case of a localised perturbation, such as a 



defect, the Wannier functions converge to the periodic functions as the distance from the 
perturbation increases; this result has also been extended to three dimensions |134| . The 
general relation between eigenfunctions of a Hamiltonian and localised orbitals (leading 
to generalised Wannier functions) was investigated thoroughly fl35' . The exact form 
of the decay of Wannier functions in ID has been investigated in detail 136 , and it 



was found that it can be written as a power law multiplying an exponential. If the rate 
at which the functions decay is given by x~°'e~^^ for Wannier functions in ID, a value 
of a = 0.75 describes orthonormal Wannier functions, while non-orthonormal Wannier 
functions result in a = 0.5 or, with careful construction, a = 1.5. A more general, and 
formal study, of localisation has shown that Wannier functions demonstrate exponential 
localisation in insulators with a Chern numbei|||] of zero (i.e. which are time reversal 
symmetric) in 2D and 3D 139 . A recent study of many-body (Quantum Monte 



Carlo) Wannier functions and localisation 140 has shown that localisation (except for 
strongly correlated systems) is similar for one-electron and many-electron systems, with 
the difference related to the correlation hole. This builds on earlier work on natural 
Wannier functions in correlated systems (defined in terms of the natural orbitals) where 



similar localisation properties were found [141j . An important study 142| of the number 
of iterations required to reach convergence in a given system, and how this number of 
iterations scales with system size found that Wannier-representable insulators can be 
considered truly 0{N), with the time to the ground state not dependent on system size 
(though some of the results of this paper have been shown to be pessimistic 143| ). 

It is not the intention of this review to cover all aspects of Wannier function theory 
and their use, though we summarise results relevant to linear scaling methods below. 
There are excellent reviews on this subject, particularly as it relates to polarisation 



138 , 144 . In quantum chemistry, the equivalent localisation procedure (though without 



The Chern number is related to the Berry connection; the Berry phase is an important part of the 



modern theory of polarisation and is discussed extensively elsewhere 137 138 
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Bloch states) is known as Boys-Foster localisation 145 . The modern approach to 



polarisation relies on a definition in terms of the expectation values of the position 
operator in terms of Wannier functions of occupied bands 146 . Much of the modern 



theory of polarisation 144 , particularly relating to the Berry phase, is concerned with 
and overlaps with definitions of localisation, and the difference between insulators and 
metals; full details can be found elsewhere 147 150 . Recent important developments 



in the field also relate to topological insulators, and the wealth of physics contained 
therein. 

Creating Wannier functions is a difficult problem, due to the many possible 
different definitions of functions themselves, and issues with composite bands. An 
early proposal showed that removing the orthogonality constraint created ultra-localised 
Kohn presented a variational method jl24j and an efficient iterative 
both to find generalised Wannier functions. A key development in 



151 



152 



functions 
method 

the use of Wannier functions was a reliable method to produce maximally localised 
Wannier functions (MLWF) [137 . The freedom in phase of the Bloch states results in 
the Wannier transformation being underdetermined. The lack of determination allows 
extra restrictions to be placed on the Wannier functions, without loss of generality. The 
criterion used was to find the Wannier functions which minimised the functional Q, 
defined as: 



(15) 



[Wor 



\r\won) and (r^). 



for a sum over the bands n in the simulation, with (r^) 
{won\r'^\won)', IS the n*'^ Wannier function in the cell at the origin (cf Eq. (12)). 
Despite the real-space definitions, the transformations can be written in terms of 
the Bloch wavefunctions in reciprocal space. This technique has found widespread 
application throughout the first principles community, and has been shown to be 
effective for disordered systems 153 and entangled bands 154|155 ; an efficient, iterative 
approach to forming MLWFs has been given 156 . Examples of MLWFs in silicon (used 
for linear scaling evaluation of the exchange potential and energy 119| ) are shown in 
Fig. [2|^a), with clear sp^ symmetry. The MLWFs from the entangled, narrow d bands in 
Cu, which overlap with a wide s band, are shown in Fig. following disentanglement; 
the d-symmetry of these functions is clearly seen. 

There have been many further developments in finding and using Wannier functions. 
Allowing some unoccupied bands to mix with the occupied bands [157 allows better 
localisation, and in some cases a more intuitive picture of bonding (it is important to note 
that including unoccupied bands has been proposed before 154| for entangled bands; 
entangled bands overlap with other bands across the Brillouin zone, as opposed to, for 
instance, the isolated valence bands of an insulator). A linear scaling technique has been 
demonstrated for the creation of Wannier functions [158 , which uses projection as others 
have before 130, 131, 133 135 , while another technique derived separate eigenvalue 



equations for each Wannier state 159 ; an alternative approach builds Wannier functions 
using perturbation theory to correct a simple initial approximation 160 . Another 
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Figure 2. (a) Maximally localized Wannier functions in silicon used in calculating 
exact exchange with linear scaling effort (from [119 ); (b) Maximally localized Wannier 
functions in Cu (from [154| ). Reprinted figures with permission from X. Wu et ai, 
Phys. Rev. B 79, 085102 (2009) and I. Souza et ai, Phys. Rev. B 65, 035109 (2001). 
Copyright (2009) and (2001) by the American Physical Society. 



linear-scaling approach 161 starts from the ground state density matrix and derives 
the MLWFs from there. A dynamic approach allowing on-the-fly localisation during 
molecular dynamics shows improvements in speed of simulation compared to normal 
band techniques 162 . Several of the linear scaling methods for finding the ground 



state described in Sect. 3.2.1 find the Wannier functions for the system (specifically 
those methods starting with the work of Mauri, Galli & Car 163, 164 and Ordejon et 



al. |165 166] ), and this approach has been used for the calculation of polarisation 167 



while non-orthogonal Wannier functions are used for other problems, for instance for 
a self-consistent implementation of the DFT-(-U method jl68j . The problem of finding 
non-orthogonal localised molecular orbitals 128 ,169 can be simplified |170| by defining 
centroids based on single, double or triple bonds 171 , though this requires some input 
and chemical intuition. A more recent study has investigated the general localisation 
properties of bases for eigenvector problems 57 . An approach similar to Wannier 



functions to generate a minimal basis of quasi-atomic orbitals (QUAMBO) [172 for 
post-processing uses occupied and unoccupied states, and forces the orbitals to be as 
close as possible to free-atom orbitals; the method is applicable to metallic as well as 
insulating systems. 

We also mention that localisation is used in quantum chemistry (see Sec. 3.2.7) to 
improve the scaling of perturbative and more accurate methods; for instance, a recipe to 



create localised orthonormal orbitals for fast MP2 calculations has been developed 173 



Wannier functions (and other localised orbitals) are becoming extremely powerful tools 
in extending the accuracy of DFT and Hartree-Fock methods, and we expect to see their 
use becoming widespread over the next few years. 

The localisation of the Wannier functions for a system is intimately related to the 
localisation of its density matrix. This is easily seen as the density matrix can be written 
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in terms of the bands as in Eq. (10), and is unaffected by a unitary transformation of 
the bands; hence, the Wannier transformation allows the density matrix to be written 
in terms of the Wannier functions, and the localisation properties follow. We can 



write 174 



p(r, r') = E E ^n{r, R)F„(R - R')W:{r' , R') (16) 

" R,R' 

where the occupation matrix is defined by the Wannier transform of the occupancy 

F„(R) = ^/c?ke'k-i^/„k. 

The general principle that the electronic structure of a system is localised is 



summarised by Kohn's principle of near-sightedness 175 . The principle is defined 
in terms of a typical de Broglie wavelength found in the wave function of the ground 
state, which in turn defines a local volume; the thermal de Broglie wavelength A = 



h"^ /SnigkBT might be reasonable definition to start from |176|. Changes to distant 
parts of the system (such that the distant part is far from all points in the volume) have 
a negligible effect on the electronic structure in the local volume. The nearsightedness 
of electronic structure for systems with a non-zero gap can be expressed as: 

p(r - r') ~ e-^l^-^-'l (17) 

with 7 > 0, though, as noted above, an algebraic prefactor can be defined. Much work 
in recent years has sought to relate 7 to the properties of the system. 

As with the Wannier functions, there has been considerable effort devoted to 
quantifying the localisation (or, equivalently, the range) of the density matrix; an 
excellent overview is given by Goedecker [2]. The most elegant and appealing, as well 



as intuitive, results suggests that the decay rate should depend on the gap. Kohn 126 
showed that the decay rate for Wannier functions is proportional to V m*A for a gap 
A (see also p3] ); as m* can be shown to depend on A |174|, this gives an overall 



dependence on A. It was also shown that projection operators for specific bands 



localise exponentially at large distances in the one-electron approximation 130 and 
that in periodic solids with no degeneracies there is an exponential decay related to the 
distance of the branch surface from the real axis |125| (extending the earlier work on 



Wannier function ranges 123| ); remembering that the density matrix is a projection 



operator for the whole system we see the importance of these results. 

There is still no complete analytical understanding of the range of the density 



matrix. One study used Chebyshev polynomials 176 to explore the properties of the 
density matrix and the complexity of different linear scaling methods. This suggested 
that the range was related to the gap (not exponentially necessarily, but oc 1/ a/A) for 
insulators; in metals, a finite electronic temperature, T, is required to localise the density 
matrix, and the range was found to be oc \/T. However, these results were subsequently 



shown to be incomplete. Using Fourier analysis, it was shown 177 that metals (again at 



finite electronic temperature T) show exponential localisation proportional to ksT/kp 
both in real space and also in Fourier space. These properties were studied further 1178] 



using wavelets (see Sect. 2.3), confirming the Fourier space nearsightedness. A careful 
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analytic study of simple systems also found that decay is oc A for semiconductors and 
oc T in metals at low T 174| (certainly in the weak binding limit); the study found 
that some materials show decay oc \/~K in the tight binding limit, but the behaviour 
is complex (depending in detail on atomic potentials). At low temperatures, the decay 
is oc ksT /kp (in agreement with previous work 177 ) with \/T behaviour at high T. 
It was also shown 136 for one dimensional systems that the density matrix decays 
exponentially (in the same way as Wannier functions) but with a prefactor of a = 0.5. 
Overall, there is good evidence to support the exponential localisation of the density 
matrix in insulators, and in metals at finite electronic temperature, though detailed 
behaviour depends strongly on the system. 



Further work on the principle of nearsightedness 179 underlined the earlier results 



that the decay is proportional to A for ID to 3D systems, and some interacting systems. 
There is a well-motivated suggestion that disorder increases the density matrix range 
for insulators and decreases it for metals. Numerical studies of the range of the density 
matrix for the Anderson model with varying levels of disorder |180| have shown that 



coherence is strongly affected by disorder, with the exponential localisation depending 
inversely on disorder. Density matrix decay for both Hartree-Fock and DFT has been 
plotted for different systems 181 . It has also been shown that the correlation between 



fermionic operators is exponentially localised at non-zero temperatures 182 . Figs. [3^ 
and b show the behaviour of 7 as a function of gap (A) for model insulating systems. 
Fig. |3^ plots the behaviour for a periodic one-dimensional potential, period a; this 
is clearly linear for a weak potential (small gap), while stronger potentials are more 
complex. Fig. [3]d shows the behaviour for a simple cubic array of Gaussian potentials 
along different directions in the crystal; the linear dependence of density matrix decay 
on gap is clear in all directions. By contrast. Fig. ^ shows the spatial decay of the 
density matrix in a metal, comparing exact results with a simple model which simplifies 
to give decay proportional to ksT/kp for different values of ksT as a fraction of kp- 
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Figure 3. (a) and (b): Density matrix range (7) as a function of gap (A) for (a) 
a one dimensional insulating system and (b) a three dimensional insulating system 
(from [174j ). (c) Density matrix range in a metal at a range of temperatures, given as 
a fraction of kp (from [177| ). Reprinted figures with permission from S. Ismail-Beigi 
et al, Phys. Rev. Lett. 82, 2127 (1999) and S. Goedecker, Phys. Rev. B 58, 3501 
(1998). Copyright (1999) and (1998) by the American Physical Society 
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Further analytic studies have been performed on specific systems. A study of a 



simple cubic TB model 183 allowed the derivation of analytic results, showing both 



the exponential decay and a power-law prefactor for the density matrix. The decay 
length found depended on both the gap and the hopping integrals (specifically, the 
decay length A oc A/t for hopping integral t); this work was extended to tight binding 
models of metals |184| , where a power law decay was found. Related earlier work using 
a simplified approach to density functional theory (a Sankey-Niklewski approach) for 



Si, C and Al 185 found algebraic decay for Al which was very close to the nearly-free 



electron model, with anisotropic exponential decay in Si and C (with the directions 
along bonds showing the slowest decay). A one-body density matrix with TB model 
for insulators and decay found ID and 2D analytic results with a universal power law 
times exponential but second energy scale emerges when hopping modulates, so that 
the decay is not entirely gap dependent 186,187 . An extension of this work to 2D 



anisotropic hopping 188 showed exponential localisation, but not with the same form 



as isotropic hopping; in particular, the correlation length does not vanish as the gap 
closes. 

While much progress has been made in extending the analytic results for density 
matrix range, particularly based on Wannier functions, it is clear that there is still work 
to be done. The relationship of localisation to the important solid state problems of 
disorder and polarisation (particularly as they extend into the new research areas of 
topological insulators) is fascinating. Furthermore, this area is relevant to problems in 
graphene, where defects can have an extremely strong effect on the electronic structure. 
Overall, the increasing complexity of a system changes the decay properties, and the 
whole area is far from simple. 



3.2. Solving for the density matrix 

Having established that the density matrix, and indeed the electronic structure, of 
matter is nearsighted, we can now turn to the matter of exploiting this nearsightedness 
in the search for the ground state. There have been developments in this field for over 



thirty years, and as might be expected papers comparing the different methods 189 192 



This review 



and general reviews of the subject have already been written 2 , 70 , 193 - 198 
will, naturally, build on these excellent surveys. 

The fundamental quantity in linear scaling techniques is the density matrix, and 
the fundamental property of the density matrix is its sparsity. While there are methods 
which operate using, for instance, Wannier functions, they still rely on the density 
matrix as the fundamental quantity, and the short range of the functions to achieve 
linear scaling. To obtain a linear scaling method, we must impose a range on the 
density matrix, which is a controllable approximation. An appropriate localised basis 
must be used, which will make matrices sparse; however, they must be also be stored 
and operated on as sparse matrices, which requires significant extra effort. Once these 
preparations have been made, the computational effort required to reach the ground 
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state should scale linearly with system size. It is important to note that the choice 
of truncation is imposing an additional constraint on the system, and that there will 
therefore be an extra error (or energy difference) compared to an unconstrained problem. 

The search for the ground state in terms of the density matrix cannot be made in 
terms of the original, six dimensional object p(r, r') (which is defined simply in terms 



of the bands of a system in Eq. (10) above). The most common approach is to work 
in terms of localised orbitals (also called support functions), and to assume that the 
density matrix is separable |199 200| : 



P(r,r') = 0ia(r)i^iaj/30j/3(r'), (18) 

ia,jl3 

where atoms are labelled with roman letters i and j and orbitals on atoms with greek 
letters a and P (we note that the density matrix is often notated only in terms of orbitals, 
as Kij). The only approximation made by this assumption is that the original density 
matrix had a finite number of non-zero eigenvalues (which is at most the number of 
local orbitals used). However, this is not restrictive in the context of electronic structure 
calculations. Much of the rest of this section is devoted to discussing different methods 
for finding and Ki^j/s. However, there are two further conditions which must 

be considered: 

(i) Correct electron number. The electron number is given by: 

Ne = 2Tr [KS] (19) 

where Siaji3 = {(pia\<Pji3) is the overlap matrix between the localised orbitals and 
we assume spin degeneracy, giving the factor of two. We will consider methods for 



imposing the correct electron number below in Sec. 4.2 

Idempotency. The density matrix is a projector onto the occupied subspace, and 
must have eigenvalues of either zero or one. Another way of writing this is, in both 
operator and real-space notation: 

=P (20) 
P(r,r') = J dr"p(r, r")p(r", r') (21) 

This requirement is rather hard to impose exactly, and many approaches adopt a 
weaker restriction (often known as weak idempotency) [194 , 201 , where: 



0<\p<l, (22) 



for the eigenvalue of the density matrix Ap. McWeeny 121 showed how an 



iterative scheme could be used to force an approximately idempotent matrix to 



exact idempotency; this will be discussed in Sec. 3.2.1 



As alluded to above, there are often situations where the localised orbitals used 
to form the density matrix are not orthogonal. This leads to some complications 
in the formalism, which are described in Sec. 4.1[ Briefly, either the inverse of the 



overlap is required (which can be difficult to find for sparse matrices) or some form of 
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orthogonalisation must be applied. If the density matrix for only the occupied subspace 
is required, then it is equal to the inverse overlap matrix of the local orbitals (see, for 
instance, (58 163 165 166| ). 



3.2.1. Direct and Iterative Approaches In this section, we consider two major 
approaches to linear scaling density matrix search, which turn out to share considerable 
theoretical background. We choose to group these methods to emphasise this shared 
background and stimulate further work on the development of effective techniques. 

Galli and Parrinello [58] noted that, instead of writing the density matrix in terms 



of the occupied eigenfunctions (as seen, for example, in Eq. (10) above), it can be equally 
well be written in terms of the same number of non-orthogonal orbitals (pf 



(23) 



(Note that the matrix 5*^^^ is the inverse of the overlap matrix for the orbitals, Sij = 
{(pilcpj), which automatically makes the matrix p(r,r') idempotent.) They then impose 
localisation constraints (in this case, using bucket-like potentials). The formulation 
allows the removal of any explicit orthogonalisation between eigenfunctions (which leads 
to the asymptotic cubic scaling behaviour seen in normal methods). By taking advantage 
of the sparsity of the overlap and Hamiltonian matrices, linear scaling can also be 
achieved; the proposed methods for exploiting locality (involving local volumes where 
most operations are performed) have points of contact with both the divide-and-conquer 



method (Sec. 3.2.2) and the FFT box method (see below under the onetep method in 



Sec. 5.1). This method has been developed further to include unoccupied states, with 



a real-space grid as the basis (using finite differences, as described in Sec. 2.1) 24 



with an extension to allow the centres of the localisation regions to adapt and move 
during molecular dynamics 11,37,202 . These implementations are often not truly 



linear scaling, as the inverse overlap matrix is calculated exactly, though there are many 
methods to remove this final barrier; Galli and Parrinello proposed solving for the dual 



basis functions by an iterative application of (I—S) (described in more detail in Sec. 4.1 ) 



A variational approach using the McWeeny purification transformation 121 has 



been used in a wide variety of approaches and methods 200 201p03 212 ; we will refer to 
it as the Density Matrix Minimisation (DMM) method. The McWeeny transformation 
uses the function: 



fix) = 3x2 _ 2^,3 



(24) 



which is plotted in Fig. |4j It has the property that for — |<x<|,0< f{x) < 1. This 
property is used to impose idempotency during a variational search: if an input matrix 
has eigenvalues bounded by — ^ and |, then the output will be bounded by and 1. 
This is known as weak idempotency. If an auxiliary density matrix is taken as cr(r,r'), 
then the true density matrix p(r, r') is defined as: 

p = 3cr*cr — 2a*a*cr (25) 
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Figure 4. The McWeeny purification function, f{x) — — 2x^ 



3{aH + Ha) 



2(a'^H + aHa + Ha^) 



(26) 



where a * cr(r, r') = / dr"cr(r, r")cr(r", r'). If the band energy (written as 2Tr[pif]) is 
varied with respect to the elements of the auxiliary density matrix, then the resulting 
minimisation is variational and converges to idempotency at the ground state. It can 
easily be shown that: 
5E 

Provided some method is chosen to account for the chemical potential of the electrons 
(either adjusting it to keep the number of electrons fixed, or fixing the potential, as 
described in Sec. 4.2), minimising the total energy with respect to the auxiliary density 
matrix elements leads to the ground state in a variational manner. Imposing sparsity 
on the density matrix (using methods described in Sec. 4.3) and the Hamiltonian results 
in linear scaling. 

This approach was first proposed for orthogonal tight binding 201 203 and 
subsequently extended to non-orthogonal bases 204| (discussed further in Sec. 4.1) 



and density functional theory 200 ,205 as well as finite electronic temperatures 213 



it is often referred to as the LNV method (after Li, Nunes and Vanderbilt). The 
utility of mixing an iterative McWeeny process to restore idempotency with the 
minimisation has also been explored 208 209 214 . Implementations are numerous 



200, 205 207, 211, 212, 215 217 and cover quantum chemistry approaches as well as 



density functional theory, and use minimisation techniques such as conjugate gradients 
and direct inversion in the iterative subspace. Implementations of Car-Parrinello 
molecular dynamics have also been described 218 , 219 , both for a simple fictitious 



electronic mass |218| and a variable fictitious mass to optimise convergence 219 . A 



closely related method to the DMM uses the sign matrix 220 ,221 , which is equivalent 
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to expanding the Fermi matrix (as in the Fermi Operator Expansion described in 



Sec. 3.2.3); another approach including implicit purification for finite temperatures has 



been derived 222 



An alternative approach using similar ideas for purification starts by noting that 
the density matrix and the Hamiltonian should commute (they share eigenvectors, as 
the density matrix projects onto the occupied subspace). By starting with an initial 
density matrix which commutes with the Hamiltonian, and using iterative purification 
methods, it can be proved that the ground state density matrix is the result 214 . This 



should not be surprising, as the density matrix can be written as a function of the 
Hamiltonian: 

p = e{fxI-U), (27) 

where p is the Fermi level and d{x) is the Heaviside step function (with 6{x) = l,a; > 
and 6{x) = 0,x < 1). Approaches to expanding out the step function (or the Fermi 
function as it becomes at finite temperature) using polynomials and recursion are 
discussed in Sec. 13.2.31 



The method first proposed 1214 used the original McWeeny transform for grand 



canonical (i.e. fixed Fermi level) minimisation, with the initial density matrix given by: 



A 



(/il - H) + -I 



Po = 

Pn+l = 3p^ - 2pl 

with A chosen to take the smaller value of l/(ifniax — p) and l/{p — 
eigenvalues of p lie between and 1. The same work also proposed a canonical method 
with an adapted version of McWeeny's purification function. 



(28) 
(29) 

), so that the 



;i - 2Cn)pn + (1 + Cn)pl 



Pr. 



/(I 



if c„ < 



1 



Pn+l 



[I + Cn)pl - pI 

Tr[p^ - pI] 



/Cn if Cn > 



(30) 



(31) 



Tr[p„ - pI] 

This function allows the unstable fixed point of the McWeeny function at x = c (where 
f{x) = X and f'{x) > 1) to move away from c = | to lie between c = and c = 1. 
As a result, the electron number is conserved throughout the iteration. It is important 



to note 121,214 that the grand canonical iteration is equivalent to a steepest-descent 
minimisation of the function /(p) = tr[p^(/— p)^], which links to the orbital minimisation 
method described below. 

This basic idea has been further extended and elaborated in numerous ways. As 
noted above, this approach has been combined with variational minimisation, both as 
an initialisation for the density matrix 208 and to restore idempotency 209 . An 



iterative purification was introduced as a way of correcting density matrices following 
Car-Parrinello steps (rather than imposing orthogonality) |218| . A larger set of generic 
purifications was proposed [223 , based on the equation T„(P) = / — (/ — P)", n > 2, and 
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later extended to systems with unoccupied states 224] (though it has been suggested 



that this extension is no more efficient than the original McWeeny transform 225] ) 



Similar high-order polynomials have been derived elsewhere 49 ; these methods can be 



formulated in terms of only a few matrix multiplications (e.g. four multiplies for a ninth 
order polynomial), but are not in general use. The difference between methods which 
have the same number of filled orbitals as eigenstates, and therefore density matrix 
eigenvalues of 1 only, and those with more orbitals and hence eigenvalues of and 1 is 
key in these expansions: an expansion which only has to consider filled states generally 
requires fewer matrix multiplications (and the form 2P — proposed by Stechel 226 
discussed below is a key example). 



Building on this idea, Niklasson 227 suggested a trace-correcting approach, using 



different polynomial expansions for purification depending on whether the trace of the 
density matrix is below or above the correct number of electrons, A^: 

1- (l-x)"[l + ma;], A^e > 
+ m(l -x)], Ne<N 

If m = 2, then the original McWeeny expansion is recovered. When m = 1 the two 

this has become known as the TC2 (trace-correcting 



Tmix) 



(32) 



X 



2. 



polynomials are and 2x 
second order) method. It has been extended to spin-unrestricted methods 228 and 
non-orthogonal bases 229] (discussed more fully in Sec. 4.1 the main change to the 
algorithm is to require the overlap matrix in density matrix products, so becomes 
PSP where P± is the density matrix in an orthogonal basis, and to require the inverse of 



the overlap for initialisation), and compared to LNV 191 , with some advantage found 
especially for high and low filling (though it is important to note that these iterative 
methods are not variational, and so forces cannot be calculated using the Hellmann- 
Feynman theorem). 

Two closely-related approaches use both the particle density matrix and the hole 
density matrix (defined as Q = I — P). Mazziotti 230,231 recasts and combines the 
formulae of Ref. |224j in terms of the particle and hole matrices (with the method 
depending on the order - hole or particle ffist), and shows significant computational 
speed up compared to the McWeeny approach. 



A similar technique 232 , 233 



uses 



hole and particle density matrices, but includes an empirically- adjusted parameter to 
optimise convergence. 



Niklasson 234 also introduced a series of trace resetting algorithms. This method 
combines a purification polynomial, F{x), and a reoccupation polynomial, G{x) which 
between them purify the density matrix and keep its trace correct within a certain 
domain of applicability. If the trace falls outside the domain, then it is reset by 
application of the TC2 method. The following quartic polynomials have been empirically 
found to be effective: 



F{x) = x^{3x - 3x^) 
G{x) = x\l-x)\ 



(33) 
(34) 
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leading to the TRS4 algorithm. Tests on both high- and low-filling problems (Ceo, a 
zeolite, chlorophyll and water clusters) show that the approach is more efficient and more 
accurate than the original Palser-Manolopoulos method; at mid-filling the methods are 
of similar efficacy. 

A comprehensive comparison of LNV-based minimisation and iterative methods 
finds considerable efficiency gains using purification, though it should be 



192 



emphasised that these methods are not variational, and hence force calculations will be 



complicated. Studies of error propagation for the trace resetting method (TRS4) [234 



with magnitude-based truncation (see Sec. 4.3 for discussion of different methods of 



enforcing sparsity on matrices) as well as the TC2 method 235 show that, applying 
truncation at different stages, errors can be rigorously controlled. A method for 
controlling errors within the TC2 method has been proposed and demonstrated on 



water clusters 236 . These approaches show that it is possible to pursue high accuracy 



within linear scaling methods. 

Despite this plethora of methods, some fairly simple conclusions can be drawn. 
First, that McWeeny's original proposal has been remarkably robust, and has led to 
significant important physics. Second, that for variational methods the formulation of 
McWeeny's purification algorithm in terms of an auxiliary density matrix 201 , 204 is 



the method of choice. It is probably the most commonly used method, and is ideal for 
distance-based truncation schemes. Thirdly, for iterative approaches improvements over 
the original McWeeny formula can be made (e.g. the TC2 method) and error control 
can be introduced. The iterative methods are not variational, but are simple and are 
commonly used in conjunction with tolerance-based truncation. 

Another class of methods, which is closely related to the minimisation and 
iteration techniques just described, builds the Wannier functions of a system by direct 
minimisation without constraint. The ideas were proposed independently by Mauri, 



Galli and Car 163 and Ordejon et al. 165 , though the method is generally referred to 



as MGC; given that the heart of the method is orbital minimisation, we suggest that 



the method be known as the Orbital Minimisation Method (OMM), following Ref. 237 



In both cases, the density matrix is defined in terms of A^/2 localised Wannier 
functions which tend to orthogonality as the minimisation progresses. By introducing 
an approximation to the inverse overlap (which coincides with the density matrix if only 
occupied states are considered), we can write |163 : 

P(r) = 5I</'*W<5ii0i(r) 



K 



Q 



K 



(35) 
(36) 

where K is an odd integer. When K = 1, then the approximation is Q = 2J — S*. This 
same formula was derived by Ordejon et al. 165 starting from a Lagrange multiplier 

5,; 



approach to enforce orthogonality (adding a term Y^ij ^ij{Sij ^i], 
and substituting the value of the multipliers at the minimum (Ajj 
the orbitals. After a little re- arrangement, this yields the same functional; the family of 



to the band energy) 
Hij) for all values of 
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polynomials can also be derived by rearranging the original equation to A = H+{I — S)A 
and treating it as a recurrence relation 166| . (The Ordejon et al. approach has some 
commonality with a method due to Wang and Teter described in Sec. 3.2.4 , though is 
more flexible.) It can be proved 163,165,166,238 that the energy is a minimum at 
orthogonality provided that the Hamiltonian is negative deflnite (which can be enforced 
by applying a rigid shift to the Hamiltonian of an amount if). The functional often 
quoted for these methods is then: 



E [Q, {0}] = 2 Q^jT,^ + P[P\^ +v[n-J dvp{v] 



(37) 



where p is deflned in Eq. (35) and Q is deflned in Eq. (36). Given a suitable basis 
set (e.g. localised atomic orbitals, as described in Sec. 2.3), the minimisation expresses 
the orbitals in terms of this basis, and seeks the minimum energy in terms of the 
expansion coefficients, while also applying a localisation criterion to the orbitals. At the 
minimum, the resulting orbitals will be orthogonal, by construction. 

However, as described, the method has a serious problem: there are large numbers of 
local minima, leading to severe convergence problems 164 ,166 ,238 . While the method 



can be used, and has been implemented in this form, practical solutions require some 
input guess for the orbitals based on chemical intuition; a more general solution to this 
problem is required. Kim et al. 164 showed that generalising the original formulation 
so that more orbitals than just those spanning the occupied subspace could be used. 
This means that the orbitals are not orthogonal at the energy minimum, but has the 
advantage that local minima are avoided. Hierse and Stechel 199 also generalised the 
original OMM approach, using a different approximation for the trial density matrix, 
using Eq. (18). Yang 239 used a variational approach to derive a general functional 
of this class which can use unoccupied orbitals and which only requires a Hermitian 
matrix (as opposed to a positive deflnite matrix required in previous work |199|). It 
is interesting to note that Kohn suggested a method for building orthogonal Wannier 
functions 124 which may be seen as a precursor to these methods; he noted that, unless 
the starting functions were reasonably close to the flnal functions, there were multiple 
minima. 

There are numerous implementations of these methods: the original papers 



163 165 166 238 ; the generalised versions 164,199,240 241; in parallel 242,243 



with ultra-soft pseudopotentials 36 . A real-space implementation of similar ideas 202 



uses exact inversion of the overlap, thus not forming a strict linear scaling method 
(though the cubic scaling part will have a small prefactor). A method for projecting 
locahsed functions onto the occupied subspace 244| using the Fermi Operator Expansion 
described in Sec. 3.2.3 showed improvements in convergence, but does not solve the 
problem of initial functions. 

More recently, Tsuchida 44 , 237 has proposed an augmented orbital minimisation 
method to overcome the convergence problems. The essence of the method is to deflne 
highly localised kernel functions (which contain the centre of a localisation region, and 
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are typically around 1 ao in radius); the localised orbitals are then forced to be orthogonal 
to the kernel functions. This change is sufficient to make the method stable and rapidly 



convergent. Indeed, the applications described in Sec. |5.2| suggest that this is one of the 
most successful hnear scaling approaches. 

At one level the density matrix-based methods appear quite different to the orbital 
minimisation methods: in the first case, the elements of the density matrix are the 
variables, while in the second the orbitals themselves are the variables. However, we 
can make a close connection to the methods described above, by considering alternative 
ways of writing the density matrix; these demonstrations have been developed by a 
number of authors [167 , 193 194, 205| . If the OMM functional Q is written in terms of 



density matrices (compare Eq. (35)), then we find: 
p(r,r') 



2p(r, 



P ir,r 



(38) 



where p(r, r') = J2i Xi(i")xj(i"') is a trial density matrix, and 2p — acts as a purification 
transformation. This can be generalised if the localised orbitals Xi{^) a-i'fi expanded in 
a basis: 



We can then write the trial density matrix in exactly the form of Eq. (18), 
Lap = Hi biabii3- The purification transformation is then written 2L — LSL, 



(39) 

with 
with 



Sai3 = {<Pa\(l>p)- Setting L 
OMM form. 



/ gives 5* as the density matrix, and recovers the original 



Returning to Eq. (38), we note that p is required to be positive semidefinite, and 



that the eigenvalues can thus be represented by 205 : 



Ap = (40) 
= A,(2 - A,) = kI{2 - kD (41) 

where Kp is real. The quartic function will lie in the range [0,1] when -^2 < < ^2, 
with turning points at Ap = and 1. This function is plotted along with the original 
McWeeny function in Fig. [5j It is clear that the two methods have a very similar form 
between zero and one, but the quartic potentially will introduce more local minima. 
The practical convergence rates for the methods (distinguishing between the MGC and 
Ordejon et al. methods) has been examined |190| , with the convergence rate in the 
simple systems (bulk Si and C) found to be dominated by the spectral properties of the 
Hamiltonian rather than the method. 

The DMM and OMM methods share a fundamental connection, but the DMM is 
far more commonly implemented. This is in part due to simplicity and stability, and 
also to the single minimum present in the functional. The augmented OMM promises 
significant improvements and suggests that a new resurgence of OMM applications might 
be seen; either way, these two methods are the most commonly used and applied linear 
scaling methods (see Sec. 5.1 for a description of implementations in specific codes). 
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Figure 5. The McWeeny purification function (solid black line) plotted with the 
implicit purification function of the orbital minimisation methods (dashed blue line). 
(Note that x is taken to mean Kp in the latter case.) 



3.2.2. Divide and Conquer The divide and conquer method 245 -247 is conceptually 



the simplest of the linear scaling approaches. Taking advantage of the nearsightedness 
of electronic structure, the system is divided into separate, small subsystems whose 
electronic structure is solved exactly, for example by diagonalisation. (A similar method 
was proposed almost simultaneously 248 , which combines the idea of dividing up 



the cell into subsystems with the ideas of orbital-free DFT (see Sec. 3.2.5); however, 
this method has not proved significant.) A partition function pa is defined for each 
subsystem, such that J2aPa{^) = 1 points in the system. If we rewrite the charge 

density, then we have: 



77, r 



■- 2{r\9{eF - H)\r) 
= 2j2p^{r){r\e{eF - H)\r) = E^-W 

a a 

{r) = 2p^{r){r\9{ep-H)\r) 



(42) 
(43) 

(44) 



where 6{x) is the Heaviside step function. However, this requires some approximation 
of the system-wide Kohn-Sham Hamiltonian. Making a local approximation to the 
Hamiltonian requires some way to determine ep; Yang suggested writing ha{r) = na{r), 
with: 

n,(r) = 2p,(r)(r|/(e^-i^„)|r) (45) 

where Ha is the approximation of the Kohn-Sham Hamiltonian in the subsystem. Yang 
assumed non-orthogonal basis functions purely localised within the subsystem, \(f)f{r) \ 
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and defined: 

^a = ElC)^^(0"l (46) 
ij 

where = J2ki{^a^y''i-^a)kiiS~^y^ , is an inverse matrix for the overlap {Sa)ij = 
{4>f\(f>'j) and {Ha)ij = (</)"|-f/|0°). The only system-wide constraint is the electron 
number, which gives a value for ei;'. A finite temperature is used to ensure that the 
electron number is monotonic and continuous as ej? is varied. The approximate band 
energy is given by: 

E = 2^^/(6^ - et)etmP"m) (47) 

a i 

Once a partitioning of the system, and a local basis set for each subsystem, has been 
chosen, the solution for each is found by direct diagonalisation of H^J and {Sa)ij- The 
system-wide Fermi level is fixed by the requirement on electron number, which is then 
used to construct a charge density, and fed back into the local Hamiltonians. A self- 
consistent solution is thus found for the entire system. 



The formalism has been extended to solid state systems 249 : here, the idea of 



buffer atoms is required. For each subsystem, a buffer zone of a certain width from the 
main system is retained to remove surface effects. Comparing the error as a function 
of the size of the buffer, the authors conclude that cohesive energy is converged to 0.1 
eV when there are 40-50 atoms in buffer, but the electronic structure (in particular the 
density of states, or DOS) is much slower to converge with buffer size. Forces have been 



derived for the method [250 , though the choice of applying the divide and conquer 
approach to the total system force (rather than differentiating the divide and conquer 
energy) may lead to slight discrepancies between the force and energy gradients. A more 
recent study of forces 251 showed that it is possible, though time-consuming, to derive 
an exact gradient, and proposed another approximate solution. 

A further refinement based the partitioning and calculations on the system density 



matrix rather than the wavefunctions 252 : the charge density is built from the 



density matrix. A Mulliken-like partitioning is used for assigning the density matrix 
to subsystems; the partition matrix is defined as: 

{1 if z,j G a 

1/2 if z or J G a (48) 
iiij^a 

and then we have: 

Krj=T.l^K,,=J2K^ (49) 

a a 

K?, = 'ip% E fi^F - eDCtr.C'^^ (50) 
This method has the advantage that it removes the need to perform integrals between 



subsystem eigenstates and the projector function. An alternative partitioning 253 



based on the number of subsystems a density matrix occupies has also been proposed. 
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The divide and conquer method has been implemented a number of times, for 
instance in the OpenMX code 254| (in this case in combination with a recursion 



method, as described in Sec. 3.2.3), in the siesta code 255 and in an extremely 
large scale approach 256 258 based on hierarchical real-space grids, which scales over 
multiple HPC centres. An implementation using the QUAMBO approach to localised 
orbitals also exists |259| . The divide and conquer approach has also been applied to 
quantum chemistry (up to the CCSD level) |260[ 26l] , and implementation of exact 
exchange interactions has been described |262| (where the accuracy is very dependent 
on the size of the buffer region). An assessment of the computational time required for 
different systems 179 with divide and conquer has been made, and relates the time 
needed to the required accuracy, as well as physical attributes of the system. 

The main approximation in the divide and conquer method is in projecting the 
Kohn-Sham method onto small areas; this means that there is no variational principle, 
and the exact result will depend on the choice of partitioning unless extreme care is 
taken. Moreover, the convergence with subsystem size can be very slow, so that good 
convergence is hard. Formally exact partition theory 263 , 264 suggests a route forward 
in putting these approaches on a more exact footing. The key idea is to transform a 
set of interacting fragments of a molecule or other system into a set of non-interacting 
fragments in an effective potential (in exact parallel of Kohn-Sham theory). However, as 
with KS theory, the approach does not have an analytic functional giving the effective 
potential; intriguingly, the obvious local approximations are strongly related to orbital- 
free DFT (discussed below in Sec. 3.2.5). 

There are a number of methods related to divide and conquer. The 3D fragments 



265 266 builds on an earlier charge patching method 267 268 , and gives a good route 



to modelling large semiconductor systems, though it requires passivation of surfaces of 
the fragments. The key idea is that each fragment is part of several differently sized 
fragments. For simplicity, consider two dimensions and square fragments: then each 
fragment forms one small area (1x1), and is part of four rectangular areas (2x1 and 
1x2) and four large square areas (all 2x2). These are combined (with differing signs) to 
yield the total energy. The method as formulated is variational, making the forces easy 
to calculate. The Mosaico method 269 combines aspects of divide and conquer and 



a Wannier function method (the method seeks local molecular orbitals within specific 



areas); it has been implemented and applied within siesta 270 



The Fragment Molecular Orbital (FMO) method 271 uses a similar idea to the 
divide and conquer method (dividing a system up into small pieces) and applies it 
to the create an approximate method for proteins and related biological molecules. 
The total energy of the whole molecule is calculated from the energy of fragments and 
pairs of fragments without solving the molecular orbitals of the whole molecule. In the 
calculation of the molecular orbitals of fragments, they introduce a special technique 
for treating the bonds at the boundary of the fragments. The method has found a wide 
variety of applications 272 . There is a closely related method (the molecular tailoring 
approach 273 which introduces a different method for defining the fragments. There 
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are also proposals to use the result of an FMO-like calculation as the input density for 



a conventional or linear scaling method 274 



Two recently proposed, related methods 275,276] are related to the main idea 
of divide and conquer, but can yield exact results (though not in linear scaling time). 
Domain decomposition in a quasi-lD system (i.e. one with two dimensions much smaller 
than the other) shows linear scaling and good efficiency 275 , though the generalisation 
to 3D is not given. Once the system is subdivided, a block-diagonal Hamiltonian can be 
built and either the KS eigenstates in each block can be used as a basis for creating a 
structured H or an LDL factorisation can be used to generate the Green's function. A 
similar but more general method 276 uses contour integration of the Green's function to 
get the density matrix; after re-ordering the Hamiltonian to create a structured matrix, 
LDL^ factorisation is used recursively to find the Green's function. The method, though 
not explicitly linear scaling, is efficient: it is A^(log2A^)^ in ID, N"^ in 2D and N"^^^ in 
3D, and is exact. 

Other related approaches have been proposed for metallic systems. A KKR method 
where the system is divided up into local interaction zones (LIZ) |277| then solves for 
the Green's function within the zone using multiple scattering theory. Improvements 
include embedding in a local environment 278 ,279 . This contrasts with the screening 



in k-space found to lead to linear scaling in the number of layers for multi-layers and 
surfaces 280,281 . Recent work suggests that combining the two approaches, so that 



real-space screened KKR is used for large imaginary energies and k-space screened KKR 
for small imaginary energies, reduces scaling to 0{N^+'), where e < 0.2 |8§|83^. The 
method can be made truly linear scaling by using iterative minimisation 284 ; screened 



full-potential KKR has also been developed 285 



The simplicity of this type of approach makes it extremely attractive. It is easy to 
implement and has an obvious relation to the shortsightedness of electronic structure. 
However, the rate at which convergence to the exact result is achieved relative to 
subsystem size, and the lack of a variational principle, make these methods unsuitable 
for quantitative calculations with full DFT accuracy. 



3.2.3. Recursive and Stochastic Approaches The electronic structure of a system can 
be evaluated from the density of states as well as from the eigenvalues; for instance, the 
band energy can be written: 

^band = dEn{E)E (51) 

J —CO 

where n{E) is the density of states and Ef is the Fermi level (practically, the lower bound 
on the integral is normally taken to be the bottom of the band of occupied states). The 
potential to use this in linear scaling methods becomes clear when the density of states 
is written as a sum over local densities of states (LDOS). If a localised basis set such as 
pseudo-atomic orbitals is used, with {(j)ia{r)}, then we can write: 

n{E) = J2n^aiE) (52) 

ia 
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As with any function, the LDOS can be described by its moments 286| (for instance: 



its mean, width, skew, etc which correspond to the first, second, third etc moments). 
But in a local orbital basis, these moments can be related to powers of the Hamiltonian; 
the pth moment of the LDOS Uia, /i^a , is given by: 



dEEPrii^iE) = {ia\HP\ia) 

-f^i«il/3l-f^jri/3lj2/32 • 



E 

■•,ip-i/3p- 



H 



jp-ifSp-lia 



(53) 
(54) 



where Hiaj/^ = {ia\H\jf3). In a simple tight binding picture, this corresponds to hopping 
around a lattice following closed loops of length p. 

However, this is, in general, a rather unstable way of building the DOS for a system. 



One stable technique is the recursion method 287 , which is a Green's function method 
The LDOS is written as: 

1 



riiaiE) = lim Im {da.iaiE + ir])} ■ (55) 

IT ri^O 

This Green's function can be written in terms of a continued fraction, whose components 



are related to the elements of the tridiagonalised Hamiltonian of the system 287 The 
element Goo{Z) (where GnmiZ) = {Un\G{Z)\Um)) can be found from: 

1 



GooiZ) 



(56) 



ai 



/)2 

"2 



Z - a2- 



ft2 

"3 



with a„ the diagonal element and bn the off-diagonal element. But most Hamiltonians 
are not tridiagonal. 



The Lanczos recursion algorithm 288 is an efficient scheme for tridiagonalising a 



matrix. Consider a matrix H, which corresponds to the operator H. Let the tridiagonal 
matrix be H', whose diagonal elements are given by a„ and whose off-diagonal elements 
are given by 6„. If the states which tridiagonalise the matrix are \Un), then: 

ir,, if m 



n; 



= {U^\H\Un) = 



0, 



if m = n — 1; 
if m = n + 1] 
otherwise. 



(57) 



and the condition that the tridiagonalising states are orthonormal {{Un\Um) = <^n,m)- 
The method can be extended to non-orthogonal basis vectors with a bi-orthogonal 



approach 289 



The overall procedure in a recursion method is therefore: choose an initial starting 
state; apply the Hamiltonian repeatedly to this (generating a Krylov subspace, which 
will be discussed in detail below); from the resulting tridiagonal Hamiltonian, construct 
the Green's function, density of states and density matrix as required. The application 
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of recursion to tight binding has been described in detail elsewhere 216 ,287 , though it 
was used as the basis of one of the first DFT-based linear scaling methods 290 , with 
application to silicon using a finite difference approach. 

The recursion method as described has a major drawback: the recursion for diagonal 
elements of the Green's function, which are required for the energy, are stable and easy 
to evaluate, but the off-diagonal elements, which are required for force calculations, 
are hard to evaluate with a tendency to numerical instability. The early techniques 
used to work around these problems (using the difference between bonding and anti- 
bonding orbitals for inter-site elements, and matrix recursion) did not adequately 



resolve convergence problems. The Bond Order Potential (BOP) method 291 293 



is a mathematically complex solution to this problem, which involves redefining the 
procedure in terms of a new basis, using sum rules for the Green's function. A block 
BOP 294,295 uses a simpler basis to provide an efficient route to energies and forces 
via recursion. In all these cases, by limiting the range of the recursion to a cluster 
of specified size, and truncating the moments considered, the method becomes linear 
scaling. The block BOP method has been demonstrated with DFT 296 , using a dual 
basis approach. However, BOP methods are more commonly used to bridge between 
tight-binding and empirical methods 297 . Ozaki 254 ] has shown how the divide-and- 
conquer method (see Sec. 3.2.2) can be combined with recursion to create a stable, easily 
extensible linear scaling method. This will be discussed further below. 

A large class of methods use Chebyshev polynomials, which can be defined 
recursively: 



T,+,{H) = 2HT,{H)-T,^,{H), 



(5J 



with Tq{H) = I and Ti{H) = H. The Fermi Operator Expansion (FOE) method 

i tei 



120 , 298 , 299| writes the finite temperature density matrix (or Fermi matrix) as: 

H 



/ 



(59) 



where f{x) = 1/(1 + exp(— x/Zc^T)) is the Fermi function, and then expands the Fermi 
function as a Chebyshev polynomial. Each column of the matrix can be found by 
a recursive procedure, starting with a localised orbital, and applying the Hamiltonian 
repeatedly. Without any truncation, this yields an C(iV^) method (which has been used 
for plane- wave DFT 300| ); with truncation of the elements in a column of the Fermi 
matrix, the method scales linearly with the atoms in the system. It can be shown that 
the polynomial expansion should have a degree of the order of n ~ (cmax — emin)^^^, 
to represent the DOS adequately. The early applications were to tight binding, but 
the method can be extended to DFT 299 . The forces as originally calculated 298 



are not exact derivatives of the energy; an exact expression for the forces has been 
derived 301,302 but it involves significant computational cost. 

There are a number of methods which are closely related to FOE. The Kernel 
Polynomial Method 301 303 differs only in the way that the DOS is reconstructed: 
Gibbs factors are used to weight each term in the polynomial, reducing oscillations 
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which result from the truncation of the polynomial at a finite level. (Note that this is 
related to the use of a terminator in a continued fraction in the recursion method.) Liang 
et al. 304 give various improvements to the original method: they demonstrate fast 
resumming of polynomials; they explore different approximations to the Fermi function, 
concluding that the complementary error function is the best to use; and they show 
how the inverse temperature, (3, can be related to the accuracy required and the system 
properties (in particular band width and gap). The use of Chebyshev polynomials 
to select part of the spectrum has been extended to 0{N) methods 305 , where the 
key scaling issue is inversion of the overlap matrix (which is performed iteratively; 
overlap matrices are discussed in greater detail in Sec. 4.1). A method using maximum 
entropy techniques to extract the moments of a density of states by applying Chebyshev 
polynomials of the Hamiltonian to random vectors 306 uses importance sampling to 



select vectors which give an idempotent density matrix. This idea was extended 307 



to project the random vectors on the occupied subspace as a starting point for the 
recursion method. Wang 308 similarly defines moments of the density of states in 
terms of Chebyshev polynomials, and then calculates these using random wavefunctions 
and maximum entropy methods with a plane-wave basis set to generate the density of 
states and optical absorption spectra. 

Random vectors have also been used in a stochastic approach to invert the 
exponential of a Hamiltonian 309 . Noting that the correct thermodynamic potential 



for spinless fermions is given by: 

fi = -llogdet (j + e^(^-^^ 

the exponential is then rewritten as: 

fi = -^^Plogdet (M(0) 
P 1=1 



(60) 



(61) 



where M{1) = 1 + exp(z7r(2/ — 1)/P) exp(/3/P)(/i — H), with the exponential written as 
a Chebyshev polynomial or a Trotter expansion. It can be shown |309| that expectation 
values can be written in terms of a sum over the inverses of M{1), so that the main 
computational effort is in inverting these sparse matrices. The inverse is found by 
selecting a random vector tp, solving the linear equation M{l)(f) = and calculating 
M(/)~^ = {(pip''), with the expectation value taken over the stochastic process. The 
original application was to a very simple system, but it was developed to tight-binding 
applications in one dimensional systems 310 ,311 , using Langevin dynamics and noisy 
forces. The decomposition of the grand potential has innate scaling 0{NY~'^^'^ for 
a d dimensional system. However, by rewriting the partition function in terms of 
both electrons and ions, and using a careful sampling of the Boltzmann distribution 



an efficient 0{N) method can be developed which is valid for metals 312], though only 
demonstrated so far on tight binding systems (in this case, metallic carbon nanotubes). 
The method has been further developed with an exploration of efficient decompositions 



of the Fermi operator 313 
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The energy renormalisation group (ERG) method is an approach which takes a 
different view of the density matrix. Conventionally, the density matrix is viewed as 
the zero-temperature limit of the Fermi matrix defined above, and as the temperature 
is lowered, the correlations become longer ranged (particularly in small gap systems 
and metals) so that the matrix becomes non-sparse. The ERG method instead writes 
the density matrix as a telescopic sum of terms, with the temperature in each term 
decreasing by a factor g > 1: 

p = F^, + {Fp, - 4,) + {Fp, -Fp,) + ... (62) 

= F^o + Ai + A2 + . . . (63) 

The first term in the series is a high-temperature Fermi matrix which is short-ranged. 
Each successive term gradually corrects for lower temperatures, so that /3„ = q^Po- 
The Fermi matrix is written as a Chebyshev polynomial of the Hamiltonian, just as in 
the FOE method described above. The expansion is substituted into expressions for 
expectation values (rather than calculating a long-range density matrix); a recursive 
approach is given to coarse-grain the Hamiltonian in each successive space, using 



the Chebyshev polynomials. It is developed elsewhere 314 , but has not shown true 



linear scaling beyond ID systems and has not been applied beyond tight binding 
implementations. Recursive bisection of the density matrix 13 15] gives a method which 



scales linearly for one-dimensional systems without truncation, but is highly efficient; 
given density matrix truncation, it would scale linearly with a small prefactor. A 
recursive bisection of real space 316 allows the truncation of Kohn-Sham wavefunctions 
to be controlled, and suggests a possible route to localised orbitals without the need to 
specify centres and extents a priori. 

The essence of a subspace method is the repeated application of the Hamiltonian 
operator to a vector to form a new set of vectors, which span a space; this is often 
referred to as a Krylov subspace method, with the Krylov subspace spanned by the set 
{10), . . . , For instance, a Lanczos procedure generates 



basis vectors in a Krylov subspace which are orthogonal. It has been proposed 317 



that diagonalising within the subspace (similar, in effect, to the method of Ozaki 254 



described above) will give an efficient linear scaling method; they find that around 30 
vectors is sufficient. A more efficient variant of this method 318 solves linear equations: 

{z~H)\x,) = \j) (64) 

for given basis vectors \j) to generate vectors from which the Green's function 
elements Gij = {i\xj) can be found; the basis vectors are the Krylov subspace vectors. 
The technique used is the conjugate-orthogonal conjugate gradient method, which 
generates a set of residual vectors independent of the energy shift z. Once the vectors 
\xj) have been generated for one energy, further energies are almost trivial, and certainly 
scale as 0{1). An overview and summary of applications using tight binding have been 
given |319 , 320| , and the method has been extended to non-orthogonal orbitals 321 . 



The recursion methods were the first set of linear scaling methods proposed, and 
Lanczos approaches are widely used in many areas of physics and mathematics. The 
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most successful ab initio recursion methods are based around FOE-like methods, mainly 
because of the simplicity of Chebyshev polynomials and the overall approach. The 
convergence and general lack of variational properties have made these methods less 
successful than other approaches. 



3.2.4- Penalty Junctionals An alternative approach to imposing idempotency was 



suggested by Kohn 175 : add a term to the energy functional which penalises non- 
idempotent density matrices. This way, as the ground state is sought, idempotency is 
automatically included. The original method defines Hermitian trial functions n{r,r'), 
in terms of which the density matrix n is written as n = n^. The ground state search is 
written in terms of a functional: 

Q [n(r, r )] = E [n] - fiN [n] + aP [n] (65) 

where E [n] is the usual Kohn-Sham energy functional, /i is a chemical potential, N [n] 
is the number of electrons and P [h] is a penalty functional, 

1/2 



P[n] 



drn^ (1 — n 



~\2 



(66) 



For idempotent n, P [n] = 0, so that the penalty functional has no effect. The 
number of electrons is set by choosing fi, leaving only a as a parameter. It can be 



shown 175 that, for a given /i and a larger than a critical value ac, the correct. 



idempotent ground state density matrix is found. However, ac cannot be predicted 
exactly; too small a value of a will yield local minima while too large a value will slow 
convergence. 



A lower bound on ac can be derived 322 for a slightly different functional 



{Q [n(r, r')] = E [n] — jiN [n] + aP [n] where the trial function is used as the density 
matrix) : 

ac > 2 max |ej — /x| (67) 



However, it was also shown 322 that, due to the square-root form of the penalty 
functional, there is a branch point at the minimum which prevents standard 
minimisation approaches such as conjugate gradients from being effective. A corrected 



functional was proposed 323 which removes these problems: 
Q[fi] = E [fi] - fiN [n] + aP [hf 



(6J 



where fi is now taken as the density matrix throughout. This approach does not impose 
idempotency exactly (nor weak idempotency), with occupancy errors which can be 
written as 6fi = — (e^ — A)/a where is a Kohn-Sham eigenvalue at the minimum; the 
occupancies are such that occupied bands have more than one electron and unoccupied 
bands have negative occupancies, which also gives an a-dependent error in the total 
energy. A correction to the energy can be applied as Tr[p(l — p)^(l — 2p)] for occupied 
bands, which can be evaluated in 0{N) steps; the unoccupied bands require a more 
complex correction. Following correction, the method gives a total energy independent 
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of a. An alternative approach uses this functional as the heart of an iterative method 
via an augmented Lagrangian method [324]. 

as part 



The penalty functional method is used within the onetep code 212 



of a cascading sequence of methods (from canonical purification, through penalty 
functional and finally LNV). We note that there was an early proposal which used a 



penalty functional with linear scaling 325 ; the method sought highly localised Wannier 



functions in a basis of atomic tight-binding orbitals without explicit orthogonalisation, 
but a penalty applied for non-orthonomality (using a simple sum over all atoms i and 
their neighbours j of the form XJ2iJ2j l(V^ilV^i)P)- Overall, penalty functional methods 
have not been widely taken up. 



3.2.5. Orbital-free DFT Orbital- free DFT 326,327 returns to the original spirit of 



density functional theory, and seeks only the ground state charge density and not the 
associated orbitals, giving a huge advantage in terms of simplicity and speed. However, 
to find the energy and hence the ground-state density, a functional for the energy in 
terms of the charge density is required; this follows standard DFT methods, except for 
the kinetic energy, and much of the difficulty in orbital-free DFT lies in finding a kinetic 
energy functional for the charge density. An overview of different approaches can be 



found in a recent review article 328 



For a uniform electron gas, the Thomas-Fermi functional gives: 
Ttp = Ci. / dm(r)^/3 



(69) 



^1(37?^)^/'^. However, this is not sufficient for systems where the charge 



where Ck = 

may vary. Rapidly varying perturbations can be represented by the von Weizsacker 
functional: 



(70) 



These two functionals represent limits of behaviour, and early work on orbital-free DFT 
combined the two to form a single functional. While the functionals form the main topic 
of research, there is a further problem regarding pseudopotentials, which are in common 
use throughout physics. As the charge density is a local quantity (a function only of one 
position, r) the pseudopotentials used must be local only. This introduces a significant 
restriction, as much of the transferability and accuracy of modern pseudopotentials 
comes from the angular freedom given by non-local potentials (where dependence on two 
positions r and r' allows different potentials to be used for different angular momentum). 
Progress has been made in creating local pseudopotentials suitable for bulk use |329| , 
though this still causes problems, and restricts the transferability of the method. The 
most successful potentials are for metallic systems, and in particular those closest to the 
nearly-free electron model. 

Most of the work beyond this conceptual starting point is in creating new functionals 



330 332 , with extra freedom found by including density dependence in the kernel 333 



and non-local functionals 334 , 335 . An approach to allow the kernel to be calculated in 
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non-periodic systems 336 has enabled development of methods which combine finite- 
element modelling and coarse graining (with the properties of each element based on 
a single representative atom modelled with OFDFT) |337J. Extension to covalent 



systems and semiconductors have been introduced 338 , 339 ; the key step forward was 



the use of a non-local kinetic energy (KE) functional, involving two parameters which 
are transferrable within environments with similar coordination numbers. There is a 
freely-available implementation of orbital-free DFT, which has been demonstrated a 



calculation on one million atoms of Al 340 341| . Very recently, a new theoretical 



approach has been suggested |342| which writes the density as a functional of the 
potential (reversing the standard approach); this appears to give a well-defined route to 
kinetic energy functionals, though its impact on the field is yet to be seen. 

While this family of methods gives a good way to model large metallic systems, 
there is still the concern that the kinetic energy functional is not exact, and 
the pseudopotentials are limited, despite significant effort. The recursion methods 



(Sec. 3.2.3) are a viable alternative, though not widely used, and a detailed investigation 
of the relative accuracies of the different approaches, and their convergence, would be a 
valuable contribution. 

3.2.6. Expansion of the density matrix and tensorial approaches A recent class of 
approaches has emerged which use a change of variables (generally an exponential 
parameterisation) to impose the idempotency of the density matrix, or the 
orthogonalisation of the Kohn-Sham orbitals implicitly. These methods offer efficient 
routes to optimisation of conventional methods, as well as potential for linear scaling. 

The first proposal within linear scaling was an exponential parameterisation of the 
density matrix |343| , D. We write: 

D(X) = exp(-XS)Dexp(SX) (71) 

where X is an anti-symmetric matrix, and S is the overlap matrix between basis 
functions, as usual. This transformation preserves both the idempotency and trace 
of the density matrix, and gives a set of variables which allow a search for the ground 
state; note that the starting density matrix is key. The exponential can also be written 
in terms of a Baker-Campbell-Hausdorff (BCH) expansion: 

D{X) =D + [D,X]s + ^[[D,X]s,X]s + ..., (72) 
[A,X]s = ASX -XSA, (73) 

where the commutator in a non-orthogonal basis is notated [,]s- From this formalism, 
both Hartree-Fock and DFT can be made linear scaling, and an implementation has been 



given, with details on preconditioning of the minimisation 344 . A similar idea has been 



independently derived from geometric considerations of the minimisation 345 , though 



without linear-scaling application. A related, orthogonal basis method using curvy 



steps has demonstrated linear scaling 346 , and its generalisation to non-orthogonal 



bases is discussed in detail below. The method has also been shown to be effective as an 
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extended Lagrangian approach to Car-Parrinello molecular dynamics 347 . This type of 



approach, using an exponential transformation as a unitary transform, has been used a 
number of times in the past in conventional methods (e.g. for Car-Parrinello Molecular 
Dynamics 348]). 

Another example of the unitary transformation leads to an approach to optimisation 
of the density matrix with curvy steps 210 . Here, the unitary transformation is used 



to preserve idempotency and electron number, with the unitary operator written as an 
exponential, via a BCH expansion. Much care is devoted to tensorial correctness (as 



discussed below in Sec. 4.1 ). The transformed density matrix is written as a power series 



e . Using tensor notation (explained below in Sec. |4.l[), they 



following the exponential expansion, which is again written in terms of an anti- Her mitian 
operator. A, with U 
write: 



j=oJ- 



p[j]Y A° -A^ fpl-'l 



(74) 
(75) 

(76) 



While the matrices in these equations are mixed, fully contravariant matrices, such as 
the density matrix, are simpler to work with, giving symmetric commutators. The 
gradient can be found as S~^HK — KHS~^ in the contravariant notation; taking a 
step in this direction and expanding out the energy in terms of the density matrix 
gives a polynomial search to some order, rather than the usual linear assumption. If 
the expansion for the expansion is truncated at linear order, then the gradients for the 
LNV method are recovered. As the expansion is truncated, Idempotency is not exactly 
conserved, and must be re-applied using McWeeny iterations. The main advantage of 
the method is that it allows longer steps to be taken than would be possible with a 
linear method. 

A method for the direct minimisation of the wavefunction coefficients while 



remaining on the Grassmann manifold 349 has also been developed (the Grassmann 



manifold arises when the energy of the system depends only on the space spanned by 
the orbitals, and not on the orbitals themselves: the transformation from eigenstates 
to Wannier functions remains on the Grassmann manifold, for instance). The basis 
set coefficients are written as a matrix $, which is defined as a transformation of the 
bands ^ . Then the overlap of the basis functions is incorporated into a covariant matrix 
<|)t = ($tS<l>)^^<l>"l', with the application of ($^5"$)"^ as a metric. Writing the energy as 



E = Tr 



$tH$ 



and minimising with respect to $ gives a method which automatically 
includes the constraint of idempotency, and stays on the Grassmann manifold to first 
order. The inverse overlap matrix, S~^, must be applied for tensorial correctness; the 
authors suggest finding this via inversion of a submatrix of S made only from the orbitals 



within the localisation region 226 . The resulting method is closely related to Wannier 



function methods described above in Sec. 3.2.1, particularly Ref. 226 



Another application of the parameterised unitary transform is an approach to 



0{N) Methods 



42 



sparsity 350 . The £i-norm (defined for a vector x as £i = J2j compared to tlie more 
usual £2 = \JHj x]) is used as a sparsity measure for the wave-function coefficients. The 
key idea of the method is to perform unitary transformations on the orbital coefficients 
so as to maximise the sparsity of individual columns, using the £i-norm. The resulting 
method has a slight restriction, in that only gradients for steepest descent have been 
defined, but it shows promise, and is intended for linear scaling. 

Unitary transformations and sparsity have also been used within the CP2K code 
[351 352| . The normal constrained problem (with the constraint being orthogonality of 
eigenstates) is transformed to a locally unconstrained one; it is suggested that linear 
scaling behaviour should result for sparse problems 352 . The method is based on an 
orbital transformation 351 : the wavefunction coefficients, C, are transformed to new 
variables x: 



c(xj 
U 



C X 



'Sc(x) 



Co cos(U) + xU"^ sin(U) 
(x^Sx)i/2 

I 

I Vx 



(77) 
(78) 
(79) 
(80) 



Then x can be used in a minimisation, and remain linear (compared to previous ideas). 
The constraint applied is that x^Sco = 0. In the first-proposed form, a diagonalisation 
of x'^Sx is required to get eigenvectors. The linear scaling method 352 uses iterative 



refinement for the orbital transformation (OT/IR) — a function is defined such that 
fniZ) ~ fiZ) such that Z'^SZ = 1. The authors propose to use the method by 
Niklasson |353| , with fourth order found to be particularly efficient: 

1 



MZ) 



128 



Z(315 - 420r + 378r^ - 180F'' + 35F' 



where Y = Z'^SZ. The iterative refinement uses /„(. . . fn{Z) . . .). To achieve linear 
scaling, a Taylor expansion is made for matrix functions rather than diagonalising; after 
a conjugate step, iterative refinement is used to reimpose the constraint. The main 
drawback with the method is that the preconditioners required are based on dense 
algebra, giving 0{N^) scaling, but this may be lifted. 

These methods are not yet widely used, but sit at an interesting junction between 
standard approaches and linear scaling ones. The mathematical identities underlying the 
methods may well find wider use in linear scaling applications, if they can be translated 
efficiently. 



3.2.7. Quantum Chemistry The general area of wavefunction-based methods (starting, 
for instance, with Hartree-Fock wavefunctions and adding correlations via perturbative 
methods such as M0ller-Plesset (MP) methods or coupled-cluster approaches) tends to 
be known as quantum chemistry. Many of the approaches in this area scale prohibitively 
with system size: the simplest, MP2, scales asjTiiptotically with N^; coupled-cluster 
single double (CCSD) methods scale as A^^ and CCSD with perturbative triples. 
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CCSD(T), scales with A^^. Local correlation methods 354 can reduce scaling, and 



various other methods have been proposed which build on the ideas of locality for 
correlation. This is an area of sufficient complexity which has been reviewed elsewhere 



354,355 ; below, we will briefly summarise some key ideas. 



The MP2 approach in its canonical form involves pairs of integrals between occupied 
and virtual molecular orbitals. Local approaches to correlation reduce the scaling, 
but it is possible to produce linear scaling methods. The key step in one approach 
was the development of Laplace MP2, where the exact energy is written in terms of 
non-canonical orbitals, via a Laplace transform of the original energy denominator. 
This can be extended to use atomic orbitals, which gives asymptotic A^^ scaling; by 
defining spherical interaction domains centred on atoms, efficient linear scaling has been 
demonstrated 356 . Another implementation of the same approach uses multipole- 
based integral estimates for screening 357 . An alternative approach builds on the 



local correlation methods [358j, and uses density fitting and explicitly correlated wave 
functions (which depend on inter-electron distance, and improve the convergence of 
the method) |359| . It is also possible 



360 to work in terms of atomic orbitals, and 



truncate the excitations based on the number of atoms involved in excitations (yielding 
a hierarchy of methods which can lead to linear scaling). This has been extended with 



a method for forming localised orbitals 173 , with related work leading to a coupled 
cluster algorithm which scales nearly linearly 361 . The implementation of some of 



these methods in a standard code has been discussed 100 



These ideas have also been extended beyond MP2: linear scaling CCSD has been 
demonstrated using non-orthogonal localised molecular orbitals confined to fragments 



362 and by expanding the coupled cluster wavefunction in a local basis formed from a 



divide-and-conquer approach (see Sect. 3.2.2); the localisation is adapted dynamically 
to ensure error control 1261]. MRSD-CI has also demonstrated linear scaling by using 



local correlation and integral screening 363 . 



The divide and conquer method itself (Sect. 3.2.2) has also been extended to 

In both cases, the full HF orbitals from the subsystem 



MP2 364 and CCSD 260 



are used (as opposed to other quantum chemistry approaches which typically locahse 
the molecular orbitals or use atomic orbitals). Quantum chemistry is showing great 
promise in the area of increasing system size; at present, calculations on many tens of 
atoms are possible, and this trend should continue even for the most expensive methods. 



3.2.8. Extensions While much of the work on linear scaling methods has been devoted 
to finding the ground state efficiently, there is also effort being put into going beyond 
the ground state to model the response of large systems to perturbations, in particular 
excitations. In this section, we briefly survey this work, though it is not a comprehensive 



list; other reviews cover parts of the ground in more detail 70,355 



One obvious route for extending DFT is to perform real-time propagation of the 
density matrix (instead of the wavefunctions) within the framework of time-dependent 
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DFT (TDDFT). It can be easily shown that the time variation is given by: 
dp{t) 



ih- 



dt 



[Hit), Pit)] 



12) 



though there has yet to be any comprehensive investigation of the effect of truncation 
on the accuracy of propagation. This has been implemented 365 following earlier work 
on time-dependent Hartree-Fock 366 368 . It has been extended to density functional 
tight binding 369 and applied to calculating spectral properties 370 using Chebyshev 
polynomials to expand the exponential of the Hamiltonian. However, the authors note 
that the amount of effort required is still high (and propagation for around 35fs is 
needed for O.leV energy resolution in medium to large systems, though the size is not 
quantified). Another approach to real-time propagation TDDFT 371,372 uses random 
vectors and a projection method (as described in Sec. 3.2.3) to calculate response 
functions. Using an empirical pseudopotential, it has been applied to calculation of 
optical properties of silicon nanostructures 373 .2 

Standard implementations of TDDFT normally avoid real-time propagation, and 
instead search for solutions at the linear response level. The time propagation, Eq. (82), 
can be recast in terms of a Liouvillean superoperator, £, whose eigenvalues represent 
vertical excitation energies. If the full Liouvillean is used, this is the random phase 
approximation (RPA), while if certain off-diagonal terms are neglected, the Tamm- 
Dancoff approximation (TDA) results. By using projectors onto the occupied subspace 
(the density matrix, P) and its complement (Q = I — P), the eigenvalues of the 
Liouvillean can be found [374]; while Krylov subspace approaches (closely related to 
recursion methods described in Sec. 3.2.3) are efficient, a direct variational method based 
on Rayleigh quotients 375 is also tested, and has been extended 376 and applied to 
carbon nanotubes and polymers. A different approach to speeding up TDDFT also uses 
recursion 377 . Again, the method avoids explicit representation of the virtual orbitals 



(a common theme in methods to improve speed and convergence of TDDFT and many- 
body perturbation theory), though it is not linear scaling in its present form. Another 
efficient, though not yet linear scaling, TDDFT method uses Lagrange functions (see 
Sec. 2.3) and domain decomposition (see Sec. 3.2.2) |378|. 



Density matrix perturbation theory 379| (which has been extended to non- 
orthogonal basis functions |229| ) uses the trace-correcting TC2 method to generate a 
sequence of density matrices iX^^ for the unperturbed Hamiltonian). The expansion 
Xn = X^^ + An allows a recursive expression for A„+i to be derived in terms of A„ 
and X^\ The method is easily extended to different levels of perturbation theory. The 
first application 



380 



was to calculations of polarizability of water clusters, going to 
150 water molecules with a 6-31G** basis set. Other groups have built on this method 
for calculating polarisation. One approach 381 uses only the non-diagonal part of 
perturbation to find polarizability and the Born effective charge in solids. A different 
approach to the coupled-perturbed equations 382 allowed linear scaling calculation of 
the derivative of the density matrix with respect to a parameter (e.g. atomic position 
or electric field) using the McWeeny expansion; this method has been made numerically 
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more stable and applied, among other things, to calculation of NMR 383 . It has 



recently been reformulated in terms of a Laplace transform (384] (in the same way that 



MP2 calculations, described in Sec. 3.2.7 were reformulated) for greater efficiency. A less 
complex version of this approach has also been given, based on the projection properties 
of the density matrix; it has been shown to be competitive with other linear scaling 
methods 385 . A post-hoc method for calculating polarizability with linear scaling 
uses another variant of perturbation theory. A further approach to molecular 



386 



response 387 uses an exponential parameterisation (as described in Sec. 3.2.6) to find 
excitation energies and polarizabilities. 

Finally, it is clear that band edges can be found from linear scaling methods (for 
instance, see the discussion in the work by Stechel 226]). Recent developments ]388| 
have made the search more efficient. The maximum eigenvalue of pH is sought after 
finding the ground state, using the density matrix as a projection operator; equivalently, 
the minimum eigenvector of {I — p)H is sought for the LUMO. The simplest solution 
uses the Lanzcos algorithm for an extreme eigenval. The method has been applied to the 
case of a doped semiconductor where there are mid-gap or band-edge states. Band-edge 



states can also be found efficiently using iterative purification methods 389 



4. Technical Details &; Parallelisation 



4.I. Non- orthogonal Basis Functions 

In general, the localised orbitals used as basis functions in most linear scaling methods 
are non-orthogonal; it has been shown that these functions are more contracted 



169 



and give computational advantages in various systems (e.g. silicon 390 and organic 
molecules 170| ). This introduces complications in maintaining the correctness of 
tensors, and in defining different types of operator representation. The clearest 
notation uses covariant and contravariant notation (lower and upper indices), which 
was introduced for recursion methods with an excellent overview 289] . There is also 



a general explanation of the notation, and an application to second quantisation 391 



The key implication for linear scaling methods is the need for a good approximation to 
the inverse of the overlap matrix, or its decomposition, which can be found in linear 
scaling time. We summarise the basics below, and urge the interested reader to find 
more in the references given. 

If a non-orthogonal basis is defined, {le^)}, then the overlap matrix has elements 
defined by: 

Sal3 = (ea|e/3) = S^^, (83) 

where we have assumed that the basis is real. Any matrix represented in terms of the 
original basis and notated with two lower indices is called covariant; it is actually a 
tensor. There also exists a dual basis, {|e")}, which satisfies the relation: 



/3 
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A matrix represented in terms of the dual basis and notated with two upper indices is 
called contravariant] it is also a tensor. It is easy to show that, for a complete basis, we 
can write: 

ElO(e1 =ElO(ea| =X (85) 

a a 

where X is the identity. The overlap between elements of the dual basis also forms the 
inverse of the overlap for the overlap matrix in the original basis; confusingly, different 
authors notate this in different ways. If we write T"^ = (e"|e^), then 

(86) 
(87) 
(88) 

')S^p{e^\=I (89) 





= ^"/3 




= |e°) 








= E 


a/3 


al3 



where we have used the Einstein summation convention in Eqs. (86)-(88). Some authors 
write T"^ = (S^^)"^ , while others write T"^ = S"-^ . Some care has to be taken when 
considering different representations: as well as fully covariant and contravariant forms, 
there are mixed forms, such as (e"|A|e/3) = A"^. A Hermitian operator is represented 
by a Hermitian matrix in the co- or contravariant forms, but not in the mixed form: 

Ho^p = {Hp^y (90) 

^a/3 _ (^H^o^Y (91) 

Hi = {H^y (92) 

(93) 

This necessitates careful notation, with the position of the indices (i.e. whether the 
upper or lower index comes first) being significant. 

It is when considering differentials and the difference between classes of tensor that 
the notation and choice of metric becomes important. If the Hamiltonian is represented 
in terms of the original basis then it forms a covariant tensor, and the density matrix 
is a contravariant tensor. But the differential of the energy with respect to the density 



matrix (as used in methods such as the LNV technique described in Section 3.2.1) is 
covariant, and should be scaled by an appropriate metric before it can be added to the 
density matrix 392 ; this metric is T"^ (we note that it is possible to proceed with a 



different metric, which is equivalent to the original formulation 204| ). If we write the 



auxiliary matrix a{r,r') = J2ai3 4'aiii^)L°'^(j)fs{r') and search for a minimum energy with 
respect to L matrix elements, for instance, then one search step in the minimisation 
might be written: 

dE , , 

mF^ = (94) 

a"^ = T'^^a^sT^^ (95) 

L^P = Lf + Xa""^ (96) 
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where A is a varying step length. Of course, the problem can be formulated in terms of a 
mixed representation, as suggested by Stephan 196 , where all matrices are mixed. This 
requires that the Hamiltonian be premultiplied on the left by T°^, which connects closely 
with inverse preconditioning approaches to minimisation, such as the AINV method 
described below; we also note that this goes back to observations on non-Hermiticity 
for localised molecular orbitals 129 . The search step is now written: 
dE 



a 



(97) 



L"^ = (L"^)^ + Aa°^ (98) 

An alternative approach is to effectively orthogonalise the basis, which can be achieved 
by transforming the matrices to an orthogonal representation. Some decomposition of 
the overlap matrix is required for this; either a symmetric one (using 5"^/^ the Lowdin 
transformation) or a non-symmetric one (e.g. a Cholesky decomposition 207 , where 
S = UU'^ and transforming with U~^). Either way, we require either the inverse overlap 
matrix or the inverse of a decomposed matrix; naturally, these must be found using 
a linear scaling methoc^ Cholesky decomposition can be made to scale linearly with 
system size for sparse matrices 207,393 , though these techniques are extremely hard 



to parallelise efficiently 394 



There is therefore a need to find the inverse or decomposed inverse of the overlap, for 
sparse matrices with linear scaling time (and ideally good parallelisation). An excellent 
overview of approaches from a computational science stance |395| makes the important 
point that the sparsity pattern of the inverse of a matrix may not be same as that of 
original matrix. This raises the problem of how sparsity is imposed, which is discussed 



fully below in Sec. 4.3 We note that some of the methods in Sec. 3.2.1 effectively 



converge on the inverse overlap matrix. The range and sparsity pattern of are 
extremely important, as is the condition number of S. It can be shown that, for a 
localised 5, is exponentially localised 204 , though the range will depend on the 
spread of eigenvalues of 5*. The condition number of the overlap matrix (the ratio of the 
largest to the smallest eigenvalue) will determine how easily an inverse can be found; the 
condition number will depend on the basis and the number of support functions/localised 
orbitals |396|. 

An iterative approach, known as Hotelling's method or Schultz's method 34 , is 
extremely effective: 

Xn+i = 2X„ - XnAXn (99) 

will converge quadratically on the inverse of A, so that X^o = A^^. The iteration must 
be started from a suitable initial guess (which can be shown to be Bo = eA, with e a 
small number). This formula appears in a number of places: this iterative approach to 



updating an inverse from a previous step (or close to convergence) was suggested 226 



% We note that various authors use cubic scahng methods to find the inverse overlap, on the grounds 
that the prefactor for this operation is rather small; while a pragmatic approach, it is not a scalable 
one. 
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The close relation to iterative purification methods (Sec. 3.2.1) should not be surprising, 
as the inverse overlap coincides with the density matrix when only occupied states are 
considered. The main drawback of this approach is that the iteration stalls when the 
truncation error is similar to the change in inverse at a single step. 

A divide-and-conquer-like algorithm was first suggested in the context of recursion 

-i^a7 jj^^ neighbours 



methods 



397 



in order to form the matrix elements = (S~ 
P of an atom a, a series of linear equations in clusters centred on each atom a in 
the system are solved: H = MS. A similar idea was put forward using conjugate 
gradients 226 : the problem can be written as J2j SijDjk = Sik, and conjugate gradients 



is then used to solve a least-squares problem with ^f^'^^jk — [SD]'^ji, for each localised 



orbital k, with A 



(k) 



: the problem can be written as Y,j SijDjk 

Sjj^, or set to one only if within range). Enforcing symmetry and 
idempotency {D = DSD) for the diagonal elements was found to improve stability. An 
initial guess for the inverse is taken to be Sjk or 26 jk — Sjk- Linear scaling follows from 
enforcing sparsity on the matrices. 

Defining the generalised inverse of the overlap matrix, S^ , by SS^S 
a truncated approximation can be found by minimising: 



S 239 398 



Tr[BS-] = minTr[5(2X - XSX)] 



(100) 



where X is constrained to be Hermitian and B is any positive definite matrix. It was 
shown that a variational expression for the energy can be written for any number of 
localised orbitals (i.e. including unoccupied states) if the density matrix is taken to be 
K = 2X — XSX and the energy is minimised with respect to X. (A similar expression 
was found before 226 , though without the variational derivation.) 

The AINV method |399] has been used by Challacombe [209] ) route to 

form S^^H. The approximate sparse inverse is constructed from a sparse incomplete 
factorisation of the overlap; while this is an effective method, it is hard to parallelise. If 
the overlap is assumed to be decomposed as S* = LL^, traditional methods find L and 
then create Z = by an incomplete linear solution. This can introduce inaccuracies, 
and the AINV method avoids these by solving directly for Z = L~^. Once Z has been 



found, it is possible to create ZiZ'^ A) = S~^A without ever creating the inverse 209 
which may be dense or have unusual sparsity patterns. 

Ozaki has proposed using the recursion method, normally applied to finding the 
density matrix, to solve for the inverse overlap 400 . He applies the block BOP method 



(described above in Sec. 3.2.3[ ) to the resolvent: 

R{z) = {s - ziy^ (101) 

for Z a complex number. It is clear that S^^ = Re[i?(0)]; however, for a method 
implemented with finite ranges, the matrix must be symmetrised to ensure stability. 
The method is also compared to three other approaches described above: the divide- 
and-conquer-like method 397 ; Hotelling's method; and a Taylor expansion approach 
(as suggested in unconstrained minimisation methods 163 , 166 described above in 
Sec. 3.2.1). All methods are effective for diamond, though the recursion method is 
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slightly faster, while the Hotelling method yields larger errors for fee Al. The choice 
of Taylor expansion (simply using a polynomial) precluded testing on the systems, as 
it did not converge. A dual-basis recursion method has also been suggested [296], with 
the dual basis calculated simultaneously with the basis (once a starting state has been 
defined) . 

A method for improving an approximate factorisation of an inverse 353| (or an 



IS ) iterates Z 



n+l 



inverse itself, by decomposing S 
Xn = Z'^SZn] this method is related to another iterative approach 401 



^oOfcX^J) with 
The obvious 



drawback is the lack of an initial factorisation, but this has been solved 402 by starting 



with a recursive decomposition of the S matrix, which allows a convergent calculation 
of the inverse. An extremely similar method has been derived for the symmetric square 



root 403 ; both discussions also point out that convergence can be improved by scaling 



the overlap so that the eigenvalues lie within a convergent radius. 

In all these approaches, there is the problem of whether the inverse (or the 
decomposed inverse) should be sparse at all, or share the same sparsity patterns as 
the overlap; in this sparsity algorithm based on drop tolerances may give 

some advantages. VandeVondele 



69 



deliberately optimised basis sets to yield overlap 
matrices with small condition numbers, and show revealing data on the sparsity of 
inverse overlaps with basis size: the sparsity decreases as basis increases. Plots of 
number of non-zero matrix elements for different truncation thresholds and different 
systems 181 also show that the inverse is less sparse than the overlap. It is not yet 



unambiguously clear whether convergence can be achieved for ill-conditioned S matrices. 



4-2. Preserving Electron Number 

When varying the density matrix or localised orbitals to find the ground state, as well as 
maintaining idempotency (which has effectively occupied most of the methods discussed 
so far), the correct number of electrons must be maintained, as mentioned above. It is 
also possible to work at a fixed Fermi level (chemical potential for electrons, as suggested 
in one of the earliest methods 201] ) though this is often a less physically reasonable 
approach. A grand potential is often defined: 

n = ^Tot - fiN, (102) 
= Tr [K {H - ^/)] if = (103) 

= Tr [K {H - fiS)] if {(f)M = Sap (104) 

An early tight-binding approach [404 included the derivative of the electron number 
in the gradient of the energy with respect to L matrix element, with the chemical 
potential from the previous step used for /i. This is similar to the approach used in the 
Conquest code [205], though instead of using the previous value of /z, the Conquest 



code projects out the direction of electron change (dNe/dLap) from the search direction. 
After the line minimisation for energy, a separate search for the correct electron number 



should be performed 205 404 
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An alternative is to treat the auxiliary density matrix L as the real density matrix, 



and use the McWeeny purification to alter the gradient 207 . In this case, the electron 
number becomes Ng = Tt[LS] in a non-orthogonal basis, and it can be shown that 
a traceless gradient can be found by defining a slightly different grand potential, and 
treating as a parameter: 

n{L)=TT[KH] + i2{TT[LS]-Ne) (105) 
l_i = - Ti [3{HLS + SLH) (106) 
- 2{HLSLS + SLHLS + SLSLH)] /N^ 

The key advantage of working with this functional is that the search for the ground state 
does not perturb the electron number, so that given a starting point with the correct 
electron number, only /i needs to be updated. However, the density matrix so defined 
will be less idempotent than K will be, imposing an additional approximation. 

The final approach taken to maintain correct electron number is simply to rescale 



the density matrix, either after each line minimisation 211 or continuously 212 
If the auxiliary density matrix is used as the density matrix, then a scaling factor 
L — > NeL/TT[LS] is applied at each step. Alternatively, the purification transformation 



can be adapted to form a purification and normalisation transformation 212 



K - N 3L^L - 2LSLSL 

'Tr[{3LSL-2LSLSL)Sy ^ ' 

where Ng is the number of electrons in the system. This transformation potentially 



introduces multiple minima, though is reported 212 not to adversely affect convergence. 



4-3. Parallelisation and Sparse Matrices 

In this section we consider two important technical problems: parallelisation of linear 
scaling codes, and the implementation of sparse matrix methods. Sparse matrices are 
key to linear scaling computational time and storage, while efficient parallelisation is 
needed for access to systems of more than ten thousand atoms or so. We will consider 
sparse matrices first, and then turn to strategies taken for parallelisation. 

We must consider both the technology of sparse matrices and how sparsity is 
imposed, and what errors different approaches will impose; the two main approaches 
to sparsity (or truncation of a matrix) are: first, to consider a distance-based criterion 



(appealing to the results of Sec. 3.1 , so that an element is set to zero and neglected if the 
distance between atoms or localisation centres is greater than some cutoff); and second, 
to drop an element if its value falls below some threshold. The first approach tends to 
give a clear, well-defined sparsity pattern while the second requires application of the 
tolerance after each operation. It is, however, easier to estimate and control an overall 
error due to sparsity with the second method. A recent analysis of matrix sparsity 



and how the function of a sparse matrix decays 405 provides an excellent, in-depth 
understanding of sparsity. It is highly recommended for all those developing methods 
in this area. 
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An early implementation of a tight-binding orbital minimisation method [165 242 



used distance-based truncation, and stored matrices in a compressed row format, 
following a parallelisation described below. The communication pattern used for 
matrix multiplication followed a synchronous, cartesian-based technique (for a processor 
at the centre of a cube, communicating with 26 other processors, communicate 
with neighbouring processors in the sequence: (1, 0, 0); (1, 1, 0); (0, 1, 0); (—1, 1, 0); etc). 
Similar approaches occur in other methods based on the orbital minimisation family 



243 406 



The first approaches to drop tolerance used a criterion based on the individual 
matrix elements 206 207 . Challacombe 209 introduced a sparse matrix algebra based 
on atom blocks which are dropped if the Frobenius norm of the block (defined as 
Ap = \J AjjAjj for atoms / and J) is smaller than a specified tolerance. The tolerance is 
re-applied after each matrix operation (e.g. addition or multiplication). The reasoning 
for using blocks rather than elements is that a change of basis is less likely to change 
sparsity patterns and associated errors. As the number of electrons on an atom can be 
related to the trace of the on-site block, this seems reasonable. A sparse-approximate 
matrix multiply (SPAMM) |209||407| has also been introduced: the contraction over 
two atom blocks in a multiply (C/j = AjkBkj) is neglected if Bp is smaller than 
the threshold divided by the number of blocks and Ap. This leads to small elements 
being treated more approximately than large ones, and potential computational savings. 
The method has been generalised 407 to a recursive approach on successively smaller 
sub-matrices; this bears some resemblance to an interesting approach to sparsity, that 
of hierarchical or "H-matrices 408 , though these have not been used in linear scaling 
methods to our knowledge. 

A sparsity analysis 396| of matrices in electronic structure methods (pointed 
towards quantum chemistry methods) concentrated on linear alkanes, and applied 
drop tolerances. For the Hartree-Fock method, provided that a well-conditioned set 
of localised occupied orbitals exist, they prove both that the density matrix is localised, 
and that the overlap has a localised inverse. However, it does not follow that a 
localised density matrix can always be found (nor that localised orbitals can be found); a 
demonstration of the inverse of S was shown before 204 , and its significance is discussed 
above in Sec. 14.11 

Conquest 216,410 truncates following distance-based criteria, and uses atom- 
blocks and compressed row storage, with each on-site block stored in the place of a 
matrix element. An intermediate level of organisation between atoms and the unit 
cell (called partitions, and described in detail below) is used to distribute storage and 
communications for multiplies. Special matrix-multiplication routines 410 have been 
developed, with a matrix kernel isolated to allow optimisation. There are two multiply 
routines, depending on the radius of final matrix compared to the initial matrices 
{extension when the final matrix has a larger radius than the other two, and reduction 
when the final matrix has a smaller radius). A similar approach to division of the unit 
cell and communications is used for the grid points (where the charge density is found 
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Figure 6. Segment-by-segment filling factors of sparse matrices in typical large 
systems divided over P=64 processes. Matrices of the sparsity pattern KS (the 
product of the density kernel and overlap matrices) are shown for: (a) a (10,0) zigzag 
nanotube (4000 atoms), (b) a 64 base-pair sequence of B-DNA (4182 atoms), (c) a H- 
terminated wurtzite-structure GaAs nanorod (4296 atoms), and (d) 8 (8x8) supercells 
of eight-atom cubic unit cells of crystalline Si (4096 atoms). Each pixel represents a 
segment, whose color shows the fraction of matrix elements in that segment which are 
nonzero: black segments contain no nonzero elements, through red, then yellow, to 
white segments containing all nonzero elements. The nonzero elements are seen to be 
clustered near the diagonal of the matrix (though less so with increasing periodicity and 
complexity of the structure) . The space-filling curve ensures that in a given column, 
there are nonzero overlaps only with rows of atoms on nearby processes, so the nonzero 
elements form a broad band centered on the diagonal. This is clearest for the simple 
structure of the nanotube, but even for the crystalline solid, there are segments in 
which there arc few or no nonzero elements. Reprinted with permission from N. D. 
M. Hine et ai, J. Chem. Phys. 133, 114111 (2009) [409]. Copyright 2009, American 
Institute of Physics. 



on a real-space grid). Matrix multiplication efficiency has been shown for a number 
of systems, including large bulk silicon cells of up to 16,384 atoms 410 and recently 
perfect linear scaling and efficiency has been shown for systems containing millions of 



atoms 411 



ONETEP 



412 uses atom-blocking, with data for columns of matrices stored on the 



atom-responsible processor. Hand-coded multiplies for block sizes relating to numbers 
of valence states (1,4,5,9,10) are used for efficiency. The developers suggest that sparsity 
of 90% or more is needed to beneffi from routines (especially when going to product 
matrices without truncation); their analysis shows that the matrix KSK is 73% dense 
even for an 8,000 atom system. More recent work 409 allows combined dense/sparse 
operations. Matrix columns are again divided into columns, and then further into 
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process segments after assigning columns to processors. Each segment is designated 
dense (stored in full) or sparse based on fraction of non-zero elements and a threshold. 
Plots of density matrix structure for different systems are shown in Fig. [6} illustrating the 
different sparsities found in varying simulations. They show an analysis of performance 
vs sparsity threshold (which is typically set around 0.3 fraction), and see up to factor of 
two improvement over the original approach. Calculations have been demonstrated on 
systems with up to 36,000 atoms of silicon (as well as 16,000 atoms of DNA and 3,200 
atoms of alumina). 

A multi-atom blocked approach to sparse matrix multiplication 413 has been 
developed for quantum chemistry-based approaches. The method divides the cell along 
cartesian dimensions in a binary way if the cell dimension is larger than some cutoff. 
Re- This gives the multi-atom blocks, which necessarily include some zeroes, but allow 
use of efficient BLAS calls. The block size depends on the basis, from about 30 for a 
minimal basis to about 70-80 for larger bases; the overall method is significantly faster 
than element-by-element sparse multiplies. 

A study of truncation methods 235 suggested that distance-based truncation 
does not allow error control, and proposed a drop tolerances method based on sub- 
matrix magnitude using a norm, as well as implementing sparsity control during matrix 
multiplication for efficiency. In this original paper, the 1-norm was proposed. A 
hierarchical approach to matrix storage evolved out of this work 414 , which subdivides 
a matrix into sub-matrices, each of which can be further divided into more sub-matrices. 
At the lowest level, a matrix consists of real numbers; they found that five levels was 
enough for 36,000x36,000 matrices. The method permits blocks which are not related to 
atoms for performance reasons, and specifies uniform block sizes (32x32 in the examples 
given) at the lowest level. The developers use their own algorithms for symmetric square 
and inverse Cholesky based on symmetry, and show good performance and reduced 
storage relative to optimised libraries. The sparsity of the density matrix was studied 



for different molecules 181 . The effects of truncation were analysed, and possible 
ways of truncating matrices to a given tolerance were examined (so that the error 
introduced by truncation is controlled). A Euclidean norm-based method is accurate 
but computationally expensive, while calculation of Frobenius norms scales poorly with 
system size; they suggest a mixed norm based on the Euclidean norm of blocks, where 
the Frobenius norm of the block is found. A related idea 415 uses a tolerance based 
on either the number of atoms within a localisation region, or a dynamically updated 
number of atoms based on the residual of the localised functions. We note, however, 
that a drop tolerance approach can be less scalable than a distance based approach, 
as the sparsity pattern changes with each iteration. The distance-based truncation is 
variational, while the drop-tolerance truncation may not be (and is often used in non- 
variational methods such as the purification methods). 

The parallelisation of linear scaling techniques requires considerable care: the 
balance between communications and calculation will affect efficiency, and significant 
numbers of operations or variables which require storage or work on all atoms in the 
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system on all processors will lead to A^^ scaling, as well as to memory problems for large 
systems. Nevertheless, owing to the spatial locality inherent in the methods, they are 
natural candidates for parallelisation, and impressive scaling results can be achieved. 
We recollect the two types of scaling in common use: weak scaling, where the load 
per processor is kept fixed and the system size and number of processors are increased 
simultaneously; and strong scaling, where the system size is fixed, and the number of 
processors is increased. Efficiency for strong scaling is harder to achieve than for weak 
scaling, though weak scaling may well be the mode of operation chosen for linear scaling 
codes. 



An early TB method 242 divided space into a number of parallelepipeds equal 
to the number of processors, and assigned all atoms in a parallelepiped to a processor. 
The processor is then responsible for calculating forces and positions of atoms in the 
parallelepiped, and for storing matrix rows corresponding to those atoms. As this 
method also uses localised Wannier functions (LWF) with well-defined centres not 
necessarily associated to an atom, the processor is also responsible for all LWF whose 
centre lies within its parallelepiped, and any matrices indexed purely by LWF. Scaling 
on up to 512 processors and 85,000 atoms was demonstrated, and an extensive analysis 
of scaling was made, noting that as the volume assigned to each processor decreases 
relative to a boundary area (due to localisation radii) the amount of communication 
will change from depending on the number of processors (as Npj,Q(f) to depending on 
the volume of the boundary. The same approach has been used for an implementation 



of orbital minimisation 164 within an ab initio tight binding method 243 , though 



MPI and OpenMP parallelisation are shared; the resulting code was demonstrated on 
up 1,024 processors and 6,000 atoms. 

Another implementation of orbital minimisation 406 spent considerable time and 
effort on sparse matrices and parallelisation to allow efficient molecular dynamics. 
Matrices were stored as orbital blocks and neighbours of atoms (described as four 
dimensional storage). The merits of particle vs spatial distribution of atoms between 
processors was discussed; in particular, list calculation (using the link-cell method) 
relies on locality for efficiency. They chose particle/orbital division for simplicity (and 
as the method was communications limited). The problem of symmetric matrices and 
distribution is also considered: if symmetry is used to reduce storage, even distribution 
becomes more complex. Load balancing is achieved dynamically, by subdividing the 
system into 3D blocks. The blocks are ordered by x, y, z, with the atoms in each block 
ordered by z, y, x. The assignment to processors is balanced to even work (the time 
per site is calculated roughly). The resulting method showed good weak scaling, and 
reasonable strong scaling (the normal problem when going to few atoms per processor 
resulting in communications dominating). 

The approach in the MoNDoSCF code (now called FreeOn) |416 is to order 
the atomic coordinates using a space-filling curve, so that atoms which are close in 
space are close on the curve. An overlap of communications and computation is 
achieved by posting a series of non-blocking receives (MPI_Irecv) and using blocking 



0{N) Methods 



55 



sends (MPI_Send). It is suggested that this arrangement (rather than a series of non- 
blocking sends followed by blocking receives) is more efficient, and less likely to lead 
to communication imbalance. Load balancing in the code is achieved by minimising 
the imbalance of a characteristic matrix (typically the Fock or density matrix) based 
on the distribution of numbers of non-zero elements between processors. The resulting 
tests show reasonable parallelisation up to 16 processors and sustained performance and 
efficiency increases up to 95 processors. Significant effort has been put into the linear 
scaling calculation of the Fock matrix 103 417 . The exchange-correlation matrix is 



calculated by hierarchical cubature 108 : a cartesian grid is divided into blocks, with 
load balancing done dynamically to balance times. The key assumption is that the 
computational time is proportional to the total charge in box. Good scaling is found, 
and the approach is linear scaling. A similar approach is taken to the Coulomb problem 



(as described in Sec. 2.5). 
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Figure 7. Linear and parallel scaling for bulk silicon on 512-4096 cores. The insets 
show total time and total energy (made positive to enable log plot) while main graph 



shows increase in time with system size. From Ref. 411 . 



Conquest 216, 410, 418 



sub-divides the unit cell into small orthorhombic 
partitions of atoms and blocks of grid points (which are not necessarily the same 
shape and size), which are then distributed between processors using either space- 
filling curves |418| or an optimisation procedure which considers both communication 
and computation time. Communication is performed by small group (leading to a 
compromise between local storage of unneeded elements and communications efficiency) 



216 . The indexing and searching rely on different sets 410]: the primary set (the atoms 



or blocks for which a processor is responsible); the halo (all groups within range of a 
primary set); the covering set (an orthorhombic set of groups within range of a primary 
set for searching over). Processors are responsible for group of atoms (a bundle) and 
grid points (a domain) which should ideally overlap to reduce communications overhead. 
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Scaling up to 4,096 processors and 2,000,000 atoms has been demonstrated 411 



as 



illustrated in Fig. [7j The code shows perfect weak scaling from 8 to 4,096 cores, and 
reasonable strong scaling (hut clusters of Ge on Si(OOl) with either 11,620 or 22,746 
atoms showed excellent speed up from 16-128 cores and still 20 times faster going from 
16 to 512 cores) |4lT . 



There have been significant developments recently in the parallelisation of onetep 
419 . The first scheme used for parallelisation of matrix multiplication 412 started on 



the diagonal and looped over processors using the modulus of the number of processors. 
This approach was much more efficient than the original communications which used an 
inefficient MPI_Bcast call. The localised orbitals (non-orthogonal generalised wannier 
functions or NGWFs) are distributed by sending lists of parallelepipeds and psinc 
coefficients on the grid points in each parallelepiped. A batch system is used on columns 
(to get as many NGWFs as will fit in memory), with the most intensive operation being 
the Hamiltonian acting on the NGWFs. Using this scheme, scaling up to 27,000 atoms 
and 256 cores was demonstrated, though the speed-up when increasing number of cores 
by a factor of four was only 2.5. A new method 409 uses segments in each matrix to 
identify blocks, and each processor sends only contributing blocks. Rather than a round- 
robin approach (where core N sends to N+1, N+2, etc) the code now has an on-demand 
communication pattern. Scaling has been demonstrated again up 256 cores, with a three 
times speed up for increasing the core count by four times, onetep switches between 
sparse and dense matrix algebra depending on the filling of the matrices in question. 

SIESTA can be run in parallel, though in its normal implementation only on small 
numbers of processors (scaling up to 32 or so). However, recent work 420 has changed 
the parallelisation. The code originally divided up the cell by columns on the integration 
grid; the new implementation uses recursive bisection and weighting based on the 
number of neighbours. It also schedules communications to avoid unnecessary all-to- 
all communication. The resulting code shows an improvement for scaling 262 water 
molecules on 8-128 processors from 24% speed-up (old scheme) to 52% (new scheme — 
for an inhomogeneous distribution). 

The divide and conquer method has been parallelised 421 following the obvious 
route: the individual subsystems are assigned to processors, with the limiting 
parallelisation given by subsystem size. In the implementation described, the system 
data was copied to all processors, with the intended aim being small numbers of 
processors. The method has also been included in a massively multi-scale MD approach 
with divide and conquer as the embedded quantum method 256 258 . This approach has 



scaled to systems with 1.2 million quantum atoms and billions of classical atoms 257 



Orbital-free DFT has also been parallelised 422 
grid evenly between processors. 



by dividing up the real-space 



Since the real-space data is purely real, only the 
positive half-sphere of reciprocal space is needed; these points are also divided evenly 
between processors, but the sparse grid involves a map. Load balancing requires 
consideration of a compromise: if it is performed by points, this gives ideal balance, 
but a communications penalty; by line gives better communications, but worse balance; 
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by plane gives good communications by potentially poor load balance. 
4-4 ■ Structure Relaxation 

Before considering details of a subset of the applications which use linear scaling 
methods, we note that the most common task required of a code is to relax the atoms 
to their lowest energy configuration. A point which is becoming more widely discussed 
is the scaling of relaxations: just because the electronic ground state can be found in 
linear scaling time does not mean that a relaxed atomic structure can be found with the 
same scaling. Indeed, the difficulty of relaxing a system with low and high frequency 
phonon modes (or equivalently with very different length scales in the curvature of the 
energy), will depend on the ratio of the largest and smallest eigenvalues of the Hessian. 
It seems likely that, in general, the number of iterations to find a relaxed structure 
will increase with system size if relaxation is implemented naively; simple arguments 
indicate that, for conjugate gradient relaxations, the number of iterations with increase 



with the largest linear dimension in the system 423 . 

There have been a number of proposals to alleviate this problem. Preconditioning 
or relaxing low frequency relaxations using insights from elasticity theory has been 



suggested [423 424] . One approach 423 transforms the atomic forces onto a discrete 



grid (by smearing them) and uses multigrid approaches to solve for the Hessian without 
diagonalising. The resulting method shows that the number of force evaluations (and 
hence time per atom required to find relaxed structure) is independent of system size on 
scaling from 500 to 800,000 atoms of silicon with an interatomic potential. An alternative 



approach 424 uses a model Hessian either to precondition conjugate gradients or as the 
input Hessian for a quasi-Newton method (in both following exact diagonalisation or 
inversion, though this could presumably be made linear scaling if needed). The model 
Hessian is effective at improving convergence, though is not tested on increasing system 
size. 

The obvious alternative approach, particularly for large molecules, is to use internal 
coordinates (bond lengths, angles and dihedral angles) instead of cartesian coordinates. 
A linear-scaling transformation to and from internal coordinates has been proposed 



425 , which uses Cholesky decomposition on a sparse transformation matrix (which is 



shifted to avoid problematic zeroes) and iterated. This method has been harnessed to 



allow a linear scaling relaxation method 426 where the overall problem is decomposed 



into a set of 3N independent relaxation problems (using a weighted fit). The method 
is applied to a protein (with 263 atoms) and is efficient, though scaling is not tested 
explicitly. 

Relaxation can be made more efficient, and less dependent on system size, if each 
step requires a shorter time to relaxation. A method to allow extrapolation of the 



density matrix from a previous step in a relaxation to the current one 427 uses the 
trace-correcting formalism to extrapolate. The method can be applied very efficiently 
to a local perturbation, with only the overlap matrix affected (and the density matrix 
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extrapolated); it also improves convergence for standard geometry optimisation. 



The final approach extends the Fragment Molecular Orbital method (Sec. 3.2.2) 



to geometry optimisation. Similarly to the perturbation approach just described, the 
system is divided into frozen, polarisable and active domains: the frozen domain 
is calculated just once, at the start, and not updated; the electronic structure of 
the polarisable domain is updated in response to changes in the active domain, but 
the atomic positions are frozen; while the both the atomic and electronic degrees of 
freedom for the active domain are updated at each step (note that this is a form of 
embedding 428]). The method has been applied to prostaglandin synthase in complex 
with ibuprofen, with over 19,000 atoms. 



5. Implementation and Applications 

5.1. Implementations 

Implementations of linear scaling methods can be split into two camps: first, those that 
build on existing codes and methods and simply add a new solver to the self-consistent 
loop (these often include quantum chemistry-based codes and methods, such as the 



fragment methods described in Sec. 3.2.2); and second, those that create an entirely 



new implementation, often involving careful parallelisation (as described in Sec. 4.3). 
At the start of this section, we will briefly survey the second class, though this is not an 
all-inclusive list, and more details can be found in papers referenced above describing 
approaches to linear scaling. 

The main codes that we are aware of with linear scaling functionality are, in 
alphabetical order: Conquest |429] ; ErgoSCF |430] ; femteck [43||237]; FreeON 

ONETEP 



431 



432 ; OpenMX 433 



434] ; and siesta 435 . We will discuss 



PROFESS 

the basis sets used to represent the localised orbitals (or support functions or Wannier 
functions) as well as the linear scaling kernel. 

SIESTA [59,76,193,436 was one of the earliest linear scaling codes made widely 
available, and is still in widespread use today, though most of the applications use exact 



diagonalisation to find the ground state. The code uses pseudo-atomic orbitals (Sec. 2.3) 



as a basis, and the method for calculating forces and stresses is clear 59 . The linear 



scaling kernel used is the Kim, Mauri and Galli functional (Sec. 3.2.1 ) which ameliorates 



the convergence problems of the OMM functional, and recently [255] a divide-and- 
conquer kernel has been implemented. Two different linear scaling implementations of 



the vdW-DF functional for van der Waals corrections 437,438 have been developed 



within SIESTA. The code is freely available for academic use. 



The ONETEP code 212,412,419,439 443 represents the density matrix in terms 



of non-orthogonal generalised Wannier functions (NGWFs) [444 , which are in turn 
represented by periodic sine functions (which are periodic, bandwidth-limited delta 
functions) on a fixed, real-space grid. The kinetic energy (and preconditioning) 
are calculated using local Fourier transforms, known as the FFT box method 445] . 
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The linear scaling kernel 212 is the LNV method 201 204| , and the method used 
for calculating forces has recently been published 446| (it is worth noting that the 
contribution from the local pseudopotential scales with A^^, but with a small enough 
prefactor that it only starts to become significant around 15,000 atoms in the published 
scaling tests). Empirical van der Waals interactions have been implemented 447 and 
extensive tests comparing to plane-waves have been carried out 96 . The code is 
commercial, though can be obtained for a modest fee by academics. 

The OpenMX code 254,295 uses a carefully constructed set of pseudo-atomic 
orbitals 63,64 as a basis for the density matrix, and, similarly to siesta, concentrates 
on exact diagonalisation and other efficient routes to the ground state 276 , though 



recursion-based linear scaling functionality is available 254 . In particular, the code uses 



an ingenious combination of divide and conquer and recursion methods. An extensive 
set of extensions has been implemented, including non-collinear spin, constrained DFT, 
LDA-I-U, Wannier function construction and polarisation calculations. The code is 
available freely. 

The FreeON (formerly MondoSCF) code [98 p^8pIIpl3j|115||209pl6l comes from 
the quantum chemistry community, using standard basis sets to represent the density 
matrix, with linear scaling coming from trace-correcting methods 234 . Forces are 



calculated within the standard framework 448,449 . Recent extensions have considered 



perturbation theory and time-dependent DFT. The code is freely available. 

The ErgoSCF code 104 also comes from the quantum chemistry community, 
using Gaussian basis sets to represent the density matrix and trace-correcting methods 
to find the ground state. The code is parallelised for shared memory machines, and is 
freely available under the GNU Public Licence. 

The Conquest code [66|[67l [2l6|[2l7l[4lo[[4lI|[4l^ can use either pseudo- 

atomic orbitals or systematically improvable blip (B-spline) functions to represent the 
density matrix, and the LNV approach to linear scaling, following initialisation with 
McWeeny iteration; exact diagonalisation via ScaLAPACK has also been implemented. 



The forces are calculated as exact derivatives of the energy with linear scaling time 454 



The code implements a number of standard features, including GGA with non-self- 
consistent forces 110 , and recently spin and exact exchange. It also includes constrained 
DFT, which has been shown to converge in a linear scaling manner 455 . The code is 
in late-stage beta release, and will be freely available during 2012. 

The FEMTECK code 43 237 uses finite elements to represent the Wannier functions, 
and the augmented OMM method 237 to find the ground state in linear scaling time. 



The code has been applied to liquid ethanol [4^ and a fast ion conductor 456 with 
stability and efficiency. 

The PROFESS code |332||340||34l||422) is an orbital-free DFT method (Sec. [3^5] ) 
which represents the electron density directly on the grid, and has implemented a 
number of different kinetic energy functionals. As with all OFDFT codes, only 
local pseudopotentials can be used, which, along with the limitations on functionals, 
restricts its use to simple metals and some properties of semiconductors. Nevertheless, 
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extremely large systems can be addressed: recent developments in ion-ion and electron- 
ion calculations have allowed the efficient parallelisation of OFDFT, with a benchmark 
calculation on a million atom sample of perfect bulk Al demonstrated 341 , 434 . The 
code is freely available. 



5.2. Applications 

In this section, we survey the applications of the linear-scaling methods explained in 
previous sections. There are many research areas where large-scale DFT calculations 
are expected to play an important role, and recently calculations for actual scientific 
research have been emerging. However, the applications of linear-scaling methods are 
still rather limited. It is necessary, at this stage in the development of the methods, to 
investigate the availability, accuracy and efficiency of the techniques employed in each 
study. It is also not obvious what kind of quantities can be calculated by such large-scale 
DFT studies. So far, different methods have been used depending on the system or the 
phenomena of the research area. From the examples of the applications introduced here, 
we would like to summarise what has been found so far, such as the size of the systems 
which can be treated, robustness of the calculations, and the accuracy of the calculation 
method, including the quality of the basis sets used in the calculations. 



5.2.1. 0{N) calculations on biological systems One of the most promising targets 
for 0{N) DFT study are biological systems. In spite of the complex structures of 
large biomolecular systems, they provide atomically controlled systems for realising 
surprisingly sophisticated functions. With the rapid increase of experimental 
information, the demand for accurate modelling of large biological systems is also 
growing. It is a challenge to understand the mechanism of such phenomena from the 
atomic scale, especially with quantum mechanics. 

So far, most theoretical studies on biological systems from atomic scale have 
been done using classical force fields. Although these methods are powerful tools to 
investigate various phenomena in biological systems, they have a serious problem that 
the calculated results sometimes depend strongly on the parameters used for interatomic 
potentials. Different force fields or even different version of the same force field can show 
qualitatively different results. In addition, it is quite difficult for the methods to treat the 
phenomena of bond forming or breaking properly. Thus, hybrid approaches like ONIOM 
or QM/MM (quantum mechanics / molecular mechanics) methods are often used for 
the study of chemical reactions, like enzyme reactions in biology. With these methods, 
the important region where chemical reactions take place is treated by a method based 
on quantum mechanics, and the dynamics or mechanics of the atoms in the surrounding 
region is calculated using a classical force field. However, it is sometimes uncertain 
how to define these two (or more) regions and it is not clear how accurate the method 
is, especially when the QM region is not large enough. Obviously DFT calculations 
on the entire or the sufficiently large region of complex biological systems are of great 
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Figure 8. Isosurfaces of the electron density of dry DNA calculated using SIESTA 



457 for: (a) the eleven highest-occupied molecular orbitals; (b) the eleven lowest- 
unoccupied molecular orbitals; and (c) the eleven highest occupied orbitals following 
a single mutation. Reprinted figure with permission from P. J. de Pablo et ai, Phys. 
Rev. Lett. 85, 4992 (2000). Copyright (2000) by the American Physical Society. 



With such demands, there was already a report in 2000, showing the 0{N) DFT 
calculations on a dry DNA model system containing 715 atoms |457| . The system is 
a periodic double helix DNA consisting of eleven guanine-cytosine (G-C) base pairs in 
the unit cell. For this system, 0{N) calculations were performed using the Siesta code 
to employ the structure relaxation mainly with a double-^ basis sets. Using the relaxed 
structure, a conventional diagonalisation technique was also performed to calculate the 
Kohn-Sham energy and orbitals, and to confirm the accuracy of the forces with the 0{N) 
method. The results for the simple polyG-polyC system show that the topmost valence 
bands are made by the eleven highest occupied molecular orbitals of Guanine bases, and 
they are connected along the direction of the DNA chain; the eleven highest-occupied 
molecular orbitals and the eleven lowest unoccupied molecular orbitals are illustrated 
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in Fig. |8| They further investigated the effects of a defect structure by introducing the 
mutation of one G-C base pair, aiming to mimic the electronic structure of A-DNA, 
which has a random sequence of the DNA bases. Following the same procedure, the 
orbitals of this system were calculated and they found that the orbitals showed cleavage 
at the point where the swap was introduced. This suggests that the resistivity of A DNA 
should be very high, consistent with the measurements of the electron transport of the 
system. Although some of the results relied on a conventional method, this pioneering 
work clearly shows that 0{N) DFT study will be powerful in the study of biological 
systems. Similar hybrid works, combining 0{N) and conventional calculations on DNA 



have also been performed 276 , 458 




Figure 9. (a) Snapshot of the structure of hydrated ten-mer of DNA, and the 
calculated atomic forces on (b) Ist-lOOth and (c) 101st-200th atoms of DNA by 0(N) 
DFT and AMBER force field calculations. In (b) and (c), green bars on the horizontal 
axis show the atoms of phosphoric acids. 

Biomolecular systems usually have large HOMO-LUMO gaps and thus the 
electronic structure is expected to be well localised. In this respect, the systems should 
be generally suitable for 0{N) DFT studies. This aspect was clearly demonstrated in a 
theoretical study on a test DNA system using the Conquest and the DMM method [67] 
The system investigated in their study is a B-DNA decamer (5'-d(CCATTAATGG)2-3'), 
with 932 water molecules and 9 Mg^"*" counter ions. Its structure is shown in Fig. |9](a), 
including 3,439 atoms in total. The atomic positions were prepared from a snapshot 
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taken from an MD simulation with the AMBER classical force field. They showed that 
the DFT method can be applied to a system of this size, and the accuracy of the DMM 



method is reported to be surprisingly good. Figure [10] shows the dependence of total 
energy on the cutoff length Rl, calculated in a non-self-consistent way with a minimal 
basis set of PAOs. Here, both dry and hydrated DNA systems are calculated and the 
dry system is made by removing all water molecules from the system shown in Fig. |9](a). 
Since the dry system includes only 643 atoms (634 atoms for DNA and 9 Mg atoms), 
it was possible to also employ an exact diagonalisation method to test the convergence 



of the linear scaling method. The results are shown in Figure 10 The energy obtained 
by diagonalisation is shown by a horizontal dotted line, and the total energy calculated 
with various cutoffs using the DMM method is plotted by a red line with circles. These 
results clearly illustrate that the total energy by the DMM method converges very 
rapidly. The error at i?i,=8.47 A is already 0.046 eV (7.2 xlO~^ eV/atom) and, if we 
increase Rl to 9.53 A, the error becomes only 0.0078 eV (1.2 xlO~^ eV/atom). This 
rapidly convergent behaviour can also be observed in the calculation of a DNA system 
hydrated with many water molecules. The total energy of this system calculated as a 



function of Rl is plotted by a blue line with triangles in Fig. 10 Note that the energy 



scale for this system (right in Fig. 10) is same as the one for the dry system (left in 
the figure), though shifted. The convergence of this system is also very rapid and the 
total energy at Rl=13.23 A can be considered as a well converged value. Then, the 
error at Rl=8A7 A is 0.094 eV, which corresponds to 2.7 xlO~^ eV/atom. If we use 
i?L=9.53 A the error becomes 0.017 eV for 3439 atoms (4.9 xlO"*^ eV/atom). With 
such accuracy, it is possible to discuss the difference of the total energy induced by a 
local reaction in a system containing several thousand atoms. There are reports showing 
that similar accuracy can be obtained also in other linear-scaling methods 276 . These 



results suggest that 0{N) DFT methods will be able to provide quite accurate results 
in molecular biology. 
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Figure 10. Dependence of total energy on density matrix cutoff, Rl, for DNA without 
water molecules (red line and circles, left axis) and full system (blue line and triangles, 
right axis). The dashed line shows the exact diagonalisation result for DNA without 



water. From Ref. 67 . 



The FMO method (described in Sec. 



3.2.2) is efficient and accurate for biomolecular 
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systems, and already has many published examples 272 . One of the largest problems 



tackled with the approach was a calculation of active sites on influenza A viral 
haemagglutinin 459 . The calculation used a polarisable continuum model and about 



24,000 atoms of protein. They found that the binding of a class of chemicals known 
as sialosides is not regulated by allosteric effects (in other words the number of ligands 
bound does not affect binding affinity); this result can be used in design of drugs for 
coping with influenza pandemics. One of the advantages of the FMO method is that 
they can use DFT, as well as Hartree-Fock and post-Hartree-Fock methods, like the 
MP2 method. Another advantage is that the method can easily calculate the quantities 
by which we can discuss whether the interactions between two fragments are attractive 
or repulsive. This is called pair interaction energy decomposition analysis, and can give 
useful information for the ligand-protein interactions. There has been an attempt to 
apply the FMO method to silicon systems 460 , but we note here that an investigation of 
the MTA fragment method for 2D vr-conjugated systems 461 found that large fragments 
were needed for accuracy in these types of system. 

In the actual research on biological systems using DFT, it may be a serious problem 
that DFT functionals such as LDA or GGA cannot express van der Waals interactions 
correctly. However, there have been many attempts to solve this problem. A new class of 
exchange correlation functional which includes the vdW interactions has been proposed 
by Dion et al., called vdW-DF method. There are already many examples using this 
method and some revised version (vdW-DF2) has been recently proposed [462 . There 
is also a more efficient method, called DFT+D method, which has been applied to many 
biological or organic systems 463 , 464 . This is a simple method where total energy and 
forces are calculated by adding empirically parametrized interatomic potentials to the 
total energy calculated by DFT. In the early version of the method, the parameters 
used in the method only depended on the species of atoms and did not vary in different 
environments. Recently the methods to improve the transferability, by changing the 
parameters using the charge density calculated by DFT method or from the analysis of 
the local coordination numbers, have been proposed 464 , 465 . The results obtained by 
the new methods reported so far are encouraging. These methods, vdW-DF or DFT-D, 
can be easily applied to the 0{N) methods and there are already some reports for their 
implementation to the 0{N) codes 437,447 . The vdW interactions are usually very 



weak and seem to be more effective for larger systems, thus more important in 0{N) 
DFT calculations. 

Another serious problem in the DFT study on biological systems is that the 
simulation time of molecular dynamics is very short. One candidate to overcome this 
problem at present is to use the 'force matching' method 466 , which aims to refine the 
classical interatomic potentials using the DFT results. To employ this method in the 
complex biological systems, it is necessary that we can calculate the total energy and 
atomic forces for very large biological systems using DFT. 

As mentioned previously, DMM method can calculate the atomic forces easily 
and accurately. For the system in Fig. lota), the total energy and atomic forces are 
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recently calculated with DZP basis sets and compared with those by the AMBER force 



field 467,468 . The calculated forces acting on the 1st to the 200th atoms, from a part 
of DNA, are shown in Fig. |9](b) and (c). In the figure, the green bars on the horizontal 
axis show the indices of the atoms forming the phosphoric acids. The result shows that 
the atomic forces calculated by these two methods agree well for most of the atoms. 
However, we can see that the agreement for the atoms in the phosphoric acids is much 
worse, compared with the forces on atoms of DNA bases. The deviation in the forces on 
the phosphoric acid part seem to depend on the position of the Mg counter-ion close to 
this part. We can expect that such DFT results would be useful to revise the accuracy 
of classical force fields. All of these results suggest that 0{N) DFT (or other quantum 
mechanics) methods will be able to play a significant role soon in the study of complex 
biological systems. 
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Figure 11. (a) Structure of Ge hut clusters on Si(OOl). (b) Structure of the wetting 
layer of Ge on Si(OOl), showing the 2 x N reconstruction. 



5.2.2. Order-N DFT study on nanoscale structures of Ge islands on Si(OOl) Another 
class of targets which can greatly benefit from 0{N) DFT calculations are nano 
materials or nano science. Here we would like to introduce the study on the energetics 
of a nano-structured system, Ge three dimensional islands grown on Si(OOl). The 
Ge/Si (001) system has been extensively studied because it is a prototypical example of 
hetero-epitaxial Stranski-Krastanov growth. It is also technologically important because 
of the formation of organised quantum dots. Many experimental studies have been 
reported so far, and they confirm the following results. When Ge atoms are deposited 
on Si(OOl), growth initially occurs layer by layer, up to a critical thickness of about 
three monolayers (ML). Strain due to the lattice mismatch is relieved by the formation 
of regularly spaced rows of dimer vacancies in this two-dimensional (2D) structure, 
resulting in the 2 x N structure. Deposition of further Ge atoms leads to another 
strain-rehef structure, 3D pyramid-like structure, called hut clusters |469j, whose four 
facets are well estabhshed to be {105} surfaces. The typical side length of the hut 
cluster is about 150 A. Recently, all atom DFT calculations on the hut cluster including 
a substrate were performed using an 0{N) technique to study the transition from the 



2D to 3D structures 470 ,471 . Here, we introduce this 0{N) DFT study. 
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Figure 12. Convergence of total energy during structural relaxation of Ge hut cluster 
on Si(OOl), which contains over 20,000 atoms. From Ref. [471| 



The stability of the 3D structures grown on surfaces is usually governed by the 
competition between the release of the strain energy from the formation of a 3D structure 
and the energy increase due to the larger surface area of the facets on the surface. 

50 far, theoretical approach on the Stranski-Krastanov growth mode has usually used 
continuum elasticity theory for the first part, and employed DFT calculations for the 
latter. However, a unique situation exists in the Ge/Si(001) system. The surface 
structure of the strained Ge (105) was clarified by a combination of STM and DFT 
studies and the DFT calculations show that the strained Ge (105) surface is much 
more stable than the strained Ge(OOl) surface. Therefore, even though the surface 
area increases, the contribution from the surface energy is extremely small or may be 
lowered by the formation of facets. If this is the case, we need other contributions to 
reproduce the 2D-3D transition. It is important to consider the energy contributions 
from the wetting layer, as well as the edges where the facets meet each other. In 
addition, as the area of the facets of the experimentally observed Ge hut cluster is not 
large, the evaluation of the surface energy using conventional DFT is also doubtful. For 
these reasons, the validity of previous theoretical approaches is uncertain, especially 
for small hut clusters. To overcome these problems, it is necessary to employ all-atom 
DFT simulations on this system, including the entire Ge hut cluster, wetting layer and 

51 substrate. Since the number of atoms exceeds a few thousands even for small hut 
clusters, we need a linear-scaling technique to employ DFT calculations on such large 
systems and it has been recently shown that the calculations are possible with the 
Conquest code. 

Before the work on full systems, the accuracy of the computational methods was 



thoroughly examined by the calculations on the strained Ge (105) surfaces 470 . It 



should be noted that even semi-empirical TB method, though it is based on the quantum 
mechanics, is found to be not accurate enough for the energetics of the surfaces in this 
system. Using the results, 0{N) DFT calculations on the 3D Ge hut clusters have been 
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employed 471 . At the non self-consistent level, structural optimisation on systems 
of different sizes were employed using a standard CG method. The largest system 
calculated in this work, shown in Fig. 11 , contains ~ 23,000 atoms. As we can see in 



Fig. [12} the structure optimisation is robust and accurate enough even for such large 
systems. For the 3D hut clusters, three structural models having different facet and edge 
structures are investigated. Furthermore, the structure of 2D 2 x N reconstructions 
(A^ = 4, 6 and 8) and its total energy are calculated for comparison using the same 
calculation condition. The energy of these structural models as a function of coverage 



is illustrated in Fig. 13 It shows that the 2D structure is more stable for small coverage 
of Ge atoms, but the 3D hut structure becomes stable when the coverage exceeds 2.7 
ML. Interestingly, this coverage agrees with the experimental value showing the 2D-3D 
transition. This 0{N) DFT study has succeeded to clarify the energetics of the 3D 
hut cluster systems, but the kinetic aspects are also important to simulate the actual 
growth. In this respect, since the 0{N) DFT study can treat the entire system, it 
is also possible to work on the dynamical aspects by putting additional Ge atoms on 
the optimised structures. Such works had been unavailable so far and we expect many 
fruitful information would be obtained by 0{N) DFT studies in the near future. 
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Figure 13. Comparison of energetics for different 2D (dotted lines with filled symbols) 
and 3D structures (solid lines witli open symbols or crosses) for Ge on Si(OOl). From 
Ref. (471 



5.2.3. Other examples. Here, we survey other applications. Of course, it is impossible 
to show all examples, but we try to show various areas of applications using different 
0{N) methods. 

First of all, an excellent problem for linear scaling methods is a one dimensional 
system; there is also considerable interest in the transport properties of molecules 



472 , 473 . One approach to calculating transport for large systems 474 divides the 



system into layers with local coupling between them, and then uses normal DFT 
calculations to find the electronic structure of the layers. The resulting method is linear 
scaling (and related to divide-and-conquer techniques. Sec. 3.2.2). This approach was 
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applied to defects in carbon nanotubes (CNT), and the localisation lengths associated 
with them. By examining systems with up to 25 defects they predict that Anderson 
localisation will be observed in CNT at room temperature. The method allows systems 
hundreds of nanometers long to be simulated. 

Divide-and-conquer-like algorithms have found a relatively wide application. The 
ONIOM method 475 was used, along with successive geometry updates on different 
fragments, to relax the 150,000 atom photosystem I trimer 476 . The resulting relaxed 
structure allowed quantitative identification of the location of key hydrogen atoms, as 
well as the insertion of missing atoms and correction of misaligned features from X- 
ray diffraction data. A similar approach, with divide-and-conquer embedded within a 
multiscale modelling framework 477 , allows for massively parallel deployment. Tests 
have been carried out on 258 : alumina up to 11,796,480 atoms on 131,072 Blue Gene/L 
processors (though with a rather coarse grid of 0.4 oq); MD for 432 atoms of Rb, which 
yields good agreement to X-ray pair distribution data; MD on 512 atoms of graphene to 
look at vibrational spectra; and calculations on the electron affinity of a CdSe nanorod 
with 432 atoms. The same method has been used to simulate a thermite reaction with 
1152 atoms (Al/Fe203), performing MD at 2000K for 5ps. The key result was a metal- 



oxygen flip mechanism that enhances diffusivity 478 



Orbital-free DFT (Sec. 3.2.5) is generally used for metallic systems. The PROFESS 
code has been used extensively to model Al, such as the energetics and mobility 
of vacancies 479 , Al nanowires 479,480 , and crack tips in Al j481| . In the first 
example, vacancy formation and migration energies are calculated using cells up to 
500 atoms for tri-vacancies. They find that while nearest-neighbour vacancy pairs are 
unstable, next-nearest-neighbour vacancy pairs are stable, and predict that vacancy 
clusters preferentially grow through next-nearest-neighbour vacancies. For the second 
example, Al nanowires (up to 16,770 atoms) with 1-8 nm diameter and up to 20nm 
long are stretched to examine elastic and plastic behaviours. They find that the elastic 
deformation is qualitatively similar, but quantitatively different with respect to the 
diameter; thinner nanowires are more compressed relative to the bulk fee structure. On 
the other hand, clear size dependent behaviour is observed in the plastic region. Partial 
slip as mechanism for plastic deformation is only seen for 4nm wires and above, while 
amorphous deformation is seen in narrower wires. These are illustrated in Fig. 14 In the 
third example, they also calculate the system with the embedded atom model (EAM) 
method. With the OFDFT they treated the system up to 7,800 atoms. They find 
qualitative differences of the OFDFT result to EAM for one orientation, in particular 
in how emission of twinning partial dislocations changes crack length; the difference 
is likely to be down to the surface energies being incorrect for EAM. There is also a 
quantitative difference in the onset of emission of partial dislocations with load. 

For the study of vacancies in aluminium, there is another OFDFT study which 
combined finite element modelling and coarse graining with OFDFT 482 . In this 



method, far from the region of interest, they use the energy of distorted bulk Al based 
on the local environment to coarse-grain. The authors find that lO'^-lO^ atoms are 
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Figure 14. Slip planes of a 4 nm diameter Al nanowire formed upon loading one step 
past the elastic limit. The top image shows the non-fcc interior atoms for the entire 
20 nm long nanowire ([111] axis extending from left to right). The bottom two images 
show all the atoms in three-layer cross sections of the nanowire at (a) 4-10 A and 
(b) 164-174 A along the wire length. Light blue atoms have hep structure; gray have 
fee structure; and white have unknown structure. Reprinted with permission from L. 
Hung and E. A. Carter, J. Phys. Chem. C 115, 6269 (2011) [480]. Copyright 2011 
American Chemical Society. 



needed for convergence of monovacancy formation energy. Interestingly, they find a 
change of sign in interaction energy for the di-vacancies, when they increase the size of 
the system from 32 atoms to larger. However, this result does not seem to be consistent 
with the previous result by PROFESS. In particular, the results for mono-vacancy 
formation energy, while giving almost the same value as the PROFESS result 479 , show 
different behaviour with cell size. There may be some effects from the coarse graining 
approximation, however much of the difference comes from the boundary conditions 
used, particularly for the electrostatics, and not from the OFDFT component (which 



seems to be consistent between the two implementations) [483 



The orbital minimisation method (Sec. 3.2.1) has been applied to molecular 
dynamics (MD) calculations of liquids. It has been used to test the effect of cell size on 



diffusivity of liquid water 484 : the diffusivity of liquid water calculated from DFT is 
too low over long time scales. Using MD simulations with up to 128 water molecules, 
the study found no appreciable size effects (between 32 to 128 molecules), nor any effects 
due to parameters chosen (including the localisation radius and basis chosen). The final, 
detailed study used exact diagonalisation with a DZP basis optimised for water, with 
basis set superposition errors corrected. 

The augmented OMM has been used to perform challenging and scientifically 
relevant simulations with an 0{N) code. It has been applied to molecular dynamics 
simulations of liquid ethanol |44]; this system requires large system sizes (of the order 
of 10'^ atoms to ensure the structure of hydrogen bonded chains is correct). Using seven 
localisation regions for each molecule (centred on the bond centres with one on the 
oxygen) with radii of 12ao energy is conserved extremely well, and the computation cost 
is reduced by a factor of 4.6 compared with a conventional method. The comparison 
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with experimental data is impressive: the radial distribution function shown in Fig. 15 
compares well; the self-diffusion coefficient is close (8.2x 10~^ cm^s^^ for simulation 
compared to 1.1x10"^ cm^s~^ from experiment); and the red shift in 0-H vibration 
mode due to hydrogen bonds in the liquid agrees as well (~350cm~^ in simulation 
compared to 320 ± 10cm~^ in experiment). The same method has been applied to 
calculations of Li'*' conductivity in LiBH4 [456 . Using 100 unit cells (each with two 
LiBH4 units, for a total of 1,200 atoms) and localisation regions of 10 ao for Li and 14 ao 
for B, the CPU time was reduced by a factor of 2.5 with no reduction of accuracy. The 
authors found that the high ionic conductivity results from a metastable interstitial site 
generated by a splitting of the units in the c direction, which is a new mechanism for 
ion conductivity. 




Figure 15. X-ray weighted radial distribution functions of liquid ethanol calculated 
from ab initio 0(N) molecular dynamics simulations (brown line) and obtained from 



experiments(blue line). From Ref. 44 



The LNV method is implemented in both Conquest and onetep; applications of 
Conquest have already been described above in Sec. 5.2. 1| and 5.2.2 A study of the 



formation energy of vacancies in alumina with ONETEP 409| (a-Al203) found, as would 
be expected, significant simulation cell size dependence. Using cells from 120 atoms up to 
3240 atoms, a linear correlation between defect formation energy and Madelung energy 
was found, and used to extrapolate an infinite cell size result. The simulation used a 
density matrix range of 24 ao and radii of 8ao for the Wannier functions, though the 
convergence of the energy with these values is not shown. The same approach has been 
used to calculate the electronic structure of silicon nanorods with hydrogen passivation 
on the surface 



485 



The calculations used a kernel radius of 24 ao and a Wannier 
function radius of 7ao. Nanorods containing up to 1,648 atoms were modelled, though 
the geometry optimisation was performed using an ab initio tight binding code. The 
density of states and states around the band gap are found using exact diagonalisation. 

Krylov subspace methods (Sec. 3.2.3) have been applied with tight binding 
Hamiltonians (rather than ab initio) to study cleavage in silicon. The cleavage of 
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nano- crystalline silicon 486| (for systems up to 100,000 atoms) showed formation of 



(100) facets with a partial reconstruction, and explored the effects of size difference 
and competition between bulk and surface energy. Cleavage of bulk Si along (111) 



plane 487 used 11,096 atoms (periodic in one direction only). The formation of a (2x1) 
reconstruction as well as steps during cleavage was observed, with a similar interplay 
between surface and bulk terms. 

As we have mentioned above, there are still not many scientific applications using 
0{N) DFT methods. However, as we can see from the examples surveyed in this section, 
there are now real simulations using 0{N) DFT methods and the number of applications 
is growing rapidly. As the codes develop in robustness they will be more widely used, 
and more experience will emerge in important areas such as convergence and basis sets. 
This will in turn encourage confidence in results and more applications. 

6. Conclusions 

In recent years, the trend in computing has been a dramatic increase in the numbers 
of cores on a processor, and to massive numbers of processors in high-performance 
computing centres; the recent emergence of hundreds or thousands of processors on 
graphics processing units has taken this trend even further. As computational science 
is driven by the hardware base available, we have seen a strong movement towards real- 
space basis methods as a route to exploit hardware efficiently, though the eigenvalue 
solvers have retained traditional scaling rather than shifting towards linear scaling. 

Since the first DFT linear scaling methods were proposed fifteen to twenty years 
ago, it can legitimately be said that development in 0{N) methods has been slow. 
However, in recent years, real progress towards applications has been made (for instance, 
see the proceedings of a CECAM workshop held in 2007 488]). The reasons for this 



slow development are easily understood: linear scaling introduces more parameters 
and sources of instability; standard methods have developed rapidly in efficiency and 
robustness; parallelising linear scaling methods is complex. 

Nevertheless, we have reached a point where approximate linear scaling methods 
(such as divide-and-conquer and orbital-free DFT, which are hard to pursue to high 



accuracy) are producing real applications, as discussed in Sec. 5.2 Methods which have 
the capacity for exact behaviour are now at the point where they are more efficient 
than conventional methods for systems over about a thousand atoms, and are starting 
to demonstrate real applications with predictive capability. 

Linear scaling codes can seem more complex to use than standard codes; at the 
moment there is certainly less expertise in their use and appropriate convergence criteria 
compared to standard methods. One of the key advantages of a plane-wave basis set 
is the simplicity of convergence: the cutoff energy offers a single, simple variational 
parameter; linear scaling methods which use a variational basis set (such as blips or 
psincs) have a directly equivalent grid spacing. Atomic-orbital or Gaussian based codes, 
whether conventional or 0{N) methods, by contrast, have basis sets which are hard to 
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converge in a variational manner. Linear scaling methods do have a cutoff both on the 
density matrix and on the localised orbitals, though again cutoffs on local orbitals are 
not unique to linear scaling codes. We note that these codes do not in general include an 
integral in reciprocal space over the first Brillouin zone, and therefore remove the need 
to converge the k-point mesh. Linear scaling methods do introduce an extra convergence 
criterion during minimisation (one for the solution of the density matrix and another 
for the localised orbitals). So we see that, while apparently more complex, there is 
no reason why linear scaling methods cannot become as widely used as conventional 
methods. With more researchers working on practical calculations using these methods, 
the field will rapidly improve, in the same way that the field of ab initio calculations 
changed in the years after the Car-Parrinello method was first proposed. 

Just as in conventional approaches, there is no consensus on appropriate basis sets 
(in particular whether atomic-like or variational is better). However, this should not hold 
the field back: a variational basis can be characterised in the same way as plane-waves 
or real-space grids, and the same cutoffs can be used. There is a widely-held assumption 
that double zeta (or double valence) plus polarisation basis sets are required for atomic- 
like representations, though without the consensus achieved in quantum chemistry on 
different basis set qualities and likely errors. This should come with maturity of the 
field. Even within variational basis sets, however, it is not yet clear how many orbitals 
are required when we impose localisation on the basis set functions: the smallest size, 
the same as the number of bands, carries potential convergence problems; a number 
equivalent to a minimal basis gives a good compromise between computational effort and 
variational freedom; some calculations have found that larger numbers (e.g. equivalent 
to the valence orbitals plus polarisation orbitals) are needed for accuracy, while other 
calculations achieve accuracy with the same number of orbitals as valence orbitals. As 
more calculations are performed, a deeper understanding will emerge. 

We now consider the challenges faced by the linear scaling community and future 
routes for development. Naturally these are personal choices, but they certainly 
represent important problems in the area. The first challenge is that of accuracy: how 
accurately can linear scaling methods reproduce exact methods, and with what accuracy 
can important quantities be calculated ? The question of accuracy (and convergence) 
becomes more important when considering energy differences in large systems, which 
is a natural area for applications of linear scaling codes: tight convergence will be 
required, for instance, when comparing different structures in biomolecules. There have 
been some investigations already in this area (showing energy difference convergence 



for Ge nanostructures on silicon 471 and for solvated DNA 67 , comparing absolute 



energy convergence between exact and linear scaling methods for bulk silicon 96 , and 



comparing performance for purification and DMM methods for a linear alkane 192 



as well as work on error control within linear scaling methods 235| ) but more of these 
studies are needed. It is becoming clear that good accuracy can be achieved, though it 
naturally increases the computational time required; this accuracy is important for the 
future of these methods. 
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The second challenge is that of metallic systems: there is no clear route to linear 
scaling solution for systems with low or zero gaps and extended electronic structure. 
While methods such as orbital-free DFT offer some hope, and certainly allow large 
system sizes to be addressed, they do not at present give sufficient accuracy for 
quantitative calculations. There are approaches with reduced scaling over standard 
approaches, though they will slow down at some point. It may be that these methods 
offer the best route forward either until fully linear scaling methods are developed or 
until better orbital-free functionals are found. 

The third challenge, which is faced by codes in many other areas, is to make efficient 
use of new computing architectures, particularly given the shift towards petascale 
computing. Real-space methods in general are well placed to adapt to multi-core 
and GPU-based computing, but the communications patterns developed for previous 
generations of high-performance computing are not necessarily best suited for novel 
architectures. As linear scaling methods often use specific approaches developed for the 
problem, it may be harder to adapt than for other approaches using standard packages. 
However, this is an area where linear scaling codes can be extremely successful, and the 
effort should be made. 

The fourth challenge is to improve functionality while maintaining linear scaling 
behaviour. Recent years have seen DFT improved by adding features such as exact 
exchange and dispersion forces (also known as van dcr Waals interactions). Methods 
have been proposed to implement these with linear scaling, though as always adapting 
them to the approach used (and parallelising while maintaining linear scaling and 
efficiency) poses problems. Time-dependent DFT is certainly possible with linear 
scahng, as are certain parts of quantum Monte Carlo and MP2 calculations, but 
approaches such as GW cannot be adapted in their present form (though the influence 
and portability of Wannier functions is making many interesting approaches viable). 
Embedding of more accurate methods into DFT (or approximate DFT methods) may 
become important, and linear scaling approaches are ideal for this, starting from the 
locality of electronic structure. 

The final, and in many ways most important, challenge is that of applications to 
large systems, as already highlighted. Long timescales pose a challenge to all atomistic 
simulation methods, and larger systems give longer length scales which typically are 
associated with slower response times. However, this is a generic problem. Weak forces 
such as hydrogen bonding and van der Waals interactions are also important in large 
systems, particularly biological systems. While there are semi-empirical and ab initio 
methods for calculating these forces, accuracy and testing is vitally important. For the 
problem of preparing input for, and analysing output from, calculations with millions 
of atoms, this community can learn valuable lessons from the molecular dynamics 
community, and use existing tools from that area. Finally, as this is a new field, it 
will take time to understand which physical properties can be calculated reliably and 
efficiently. 

This survey of recent developments in linear scaling approaches has shown that we 
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stand at a fascinating point. In 1999, in a previous review of linear scaling methods, 
Goedecker wrote, "Even with 0{N) algorithms it will not be possible in the foreseeable 
future to treat systems containing millions of atoms at a highly accurate density- 
functional level using large basis sets, as would be necessary for certain materials science 
applications" [2]. However, we now have the first, true DFT calculations on millions of 
atoms |411 , and fully converged, highly accurate results on systems of this size will be 
obtained in the very near future. With this capability, there is a rich variety of systems 
and phenomena which can be tackled with accurate, linear scaling DFT techniques. 
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